Broadcom Ships Tomahawk Ultra Ethernet Switch for HPC and AI

PALO ALTO, Calif., July 15, 2025 — Broadcom Inc. (NASDAQ:AVGO), a global leader in semiconductor and infrastructure software solutions, today announced the shipment of its breakthrough Ethernet switch — the Tomahawk Ultra. Engineered to transform the Ethernet switch for high-performance computing (HPC) and AI workloads, Tomahawk Ultra delivers industry-leading ultra-low latency, massive throughput, and lossless networking.

“Tomahawk Ultra is a testament to innovation, involving a multi-year effort by hundreds of engineers who reimagined every aspect of the Ethernet switch,” said Ram Velaga, senior vice president and general manager of Broadcom’s Core Switching Group. “This highlights Broadcom’s commitment to invest in advancing Ethernet for high-performance networking and AI scale-up.”

Built from the ground up to meet the extreme demands of HPC environments and tightly coupled AI clusters, Tomahawk Ultra redefines what an Ethernet switch can deliver. Long perceived as higher-latency and lossy, Ethernet takes on a new role:

Ultra-low latency: Achieves 250ns switch latency at full 51.2 Tbps throughput.
High performance: Delivers line-rate switching performance even at minimum packet sizes of 64 bytes, supporting up to 77 billion packets per second.
Adaptable, optimized Ethernet headers: Reduces header overhead from 46 bytes down to as low as 10 bytes, while maintaining full Ethernet compliance —boosting network efficiency and enabling flexible, application-specific optimizations.
Lossless fabric: Implements Link Layer Retry (LLR) and Credit-Based Flow Control (CBFC) to eliminate packet loss and ensure reliability.

“AI and HPC workloads are converging into tightly coupled accelerator clusters that demand supercomputer-class latency — critical for inference, reliability, and in-network intelligence from the fabric itself,” said Kunjan Sobhani, lead semiconductor analyst, Bloomberg Intelligence. “Demonstrating that open-standards Ethernet can now deliver sub-microsecond switching, lossless transport, and on-chip collectives marks a pivotal step toward meeting those demands of an AI scale-up stack — projected to be double digit billions in a few years.”

Tomahawk Ultra is optimized for the tightly coupled, low-latency communication patterns found in both high-performance computing systems and AI clusters. With ultra-low latency switching and adaptable optimized Ethernet headers, it provides predictable, high-efficiency performance for large-scale simulations, scientific computing, and synchronized AI model training and inference.

When deployed with Scale-Up Ethernet (SUE specification available here), Tomahawk Ultra enables sub-400ns XPU-to-XPU communication latency, including the switch transit time — setting a new benchmark for tightly synchronized AI compute at scale.

By reducing Ethernet header overhead from 46 bytes to just 10 bytes, while maintaining full Ethernet compliance, Tomahawk Ultra dramatically improves network efficiency. This optimized header is adaptable per application, offering both flexibility and performance gains across diverse HPC and AI workloads.

Tomahawk Ultra incorporates lossless fabric technology that eliminates packet drops during high-volume data transfer. Incorporating LLR, the switch detects link errors using Forward Error Correction and automatically retransmits packets, avoiding drops at the wire level. Simultaneously, CBFC prevents buffer overflows that traditionally caused packet loss. Together, these mechanisms create a truly lossless Ethernet fabric, delivering the level of reliability demanded by today’s most data-intensive workloads.

Tomahawk Ultra also accelerates performance through In-Network Collectives solving one of the most persistent bottlenecks in AI and machine learning workloads. Rather than burdening XPUs with collective operations like AllReduce, Broadcast, or AllGather, Tomahawk Ultra executes these directly within the switch chip. This can reduce job completion time and improve utilization of expensive compute resources. Importantly, this capability is endpoint-agnostic, enabling immediate adoption across a wide range of system architectures and vendor ecosystems.

Designed with innovations in topology-aware routing to support advanced HPC topologies including Dragonfly, Mesh and Torus, Tomahawk Ultra is also compliant with the UEC standard and embraces the openness and rich ecosystem of Ethernet networking.

As part of Broadcom’s Ethernet-forward strategy for AI scale-up, the company has introduced SUE-Lite — an optimized version of the SUE specification tailored for power and area-sensitive accelerator applications. SUE-Lite retains the key low-latency and lossless characteristics of full SUE, while further reducing the silicon footprint and power consumption of Ethernet interfaces on AI XPUs and CPUs.

This lightweight variant enables easier integration of standards-compliant Ethernet fabrics in AI platforms, promoting broader adoption of Ethernet as the interconnect of choice in scale-up architectures.

Together with the 102.4 Tbps Tomahawk 6, Tomahawk Ultra forms the foundation of a unified Ethernet architecture: enabling scale-up Ethernet for AI, and scale-out Ethernet for HPC and distributed workloads.

Tomahawk Ultra is 100 percent pin-compatible with Tomahawk 5, ensuring a very fast time-to-market. It is shipping now for deployment in rack-scale AI training clusters and supercomputing environments.

Broadcom Ships Tomahawk Ultra Ethernet Switch for HPC and AI

Related Posts

Deploy a Streamlit App to AWS

Open Flash Platform Initiative Aims to Cut AI Infrastructure Costs by 50%

Leave a Reply Cancel reply