Articles by tag "AI Networking"

1 Item

Set Descending Direction
  1. Is NVIDIA MCX755106AS-HEAT ConnectX-7 SmartNIC Worth It for AI Servers? When you are executing a multi-node LLM training run across a cluster of H100 or A100 GPU servers and start noticing sudden, unexplained training epoch stalls, the culprit is rarely the compute silicon. Instead, it is almost always a networking bottleneck: packet drops under heavy RoCEv2 (RDMA ...

1 Item

Set Descending Direction