Articles by tag "MQM8790"

2 Items

Set Descending Direction
  1. Troubleshooting InfiniBand Subnet Manager Failover High-Availability OpenSM Architecture for MQM8790 Unmanaged Clusters Quick Take In deployments based on unmanaged MQM8790 switches, the entire InfiniBand fabric control plane relies on an external Subnet Manager (SM). Implementing a high-availability active/standby OpenSM architecture with distinct sm_priority values prevents full cluster outages during ...
  2. Maximizing MFU: How NVIDIA SHARP™ Inside MQM8790 Accelerates PyTorch Multi-Node Training Quick Take NVIDIA SHARP™ inside the MQM8790 switch ASIC flattens the distributed training degradation curve by offloading collective AllReduce calculations from the GPU nodes directly into the network fabric. This in-network reduction reduces cross-tier traffic volume, minimizes ...

2 Items

Set Descending Direction