Articles by tag "HPC networking"

2 Items

Set Descending Direction
  1. Cost-Effective Deployment of NVIDIA Mellanox Switches for High-Bandwidth AI Clusters High-bandwidth AI clusters rely on low-latency, high-throughput networking to support demanding workloads such as machine learning training, inference, and large-scale simulations. NVIDIA Mellanox switches deliver high port density, scalable throughput, and robust reliability. This guide ...
  2. Troubleshooting InfiniBand Subnet Manager Failover High-Availability OpenSM Architecture for MQM8790 Unmanaged Clusters Quick Take In deployments based on unmanaged MQM8790 switches, the entire InfiniBand fabric control plane relies on an external Subnet Manager (SM). Implementing a high-availability active/standby OpenSM architecture with distinct sm_priority values prevents full cluster outages during ...

2 Items

Set Descending Direction