In a move that directly challenges Nvidia in the lucrative AI training and inference markets, Intel announced its long-anticipated new Intel Gaudi 3 AI accelerator at its Intel Vision event.
The new accelerator offers significant improvements over the previous generation Gaudi 3 processor, promising to bring new competitiveness to training and inference for LLMs and multimodal models.
Gaudi 3
Gaudi 3 dramatically increases AI compute capabilities, delivering substantial improvements over Gaudi 2 and competitors, particularly in processing BF16 data types, which are crucial for AI workloads.
Manufactured using a 5nm process technology, Gaudi 3 incorporates significant architectural advancements, including more TPCs and MMEs. This provides the computing power necessary for the parallel processing of AI operations, significantly reducing training and inference times for complex AI models.
Gaudi 3 expands its hardware capabilities with more Matrix Math Engines and Tensor Cores than its predecessor, Gaudi 2. Specifically, it increases from 2 to 4 MMEs and 24 to 32 TPCs, bolstering its processing power for AI workloads.
The new accelerator boasts an FP8 precision throughput of 1835 TFLOPS, doubling the performance of Gaudi 2. It also significantly enhances BF16 performance, although specific throughput figures for this improvement were not disclosed.
It has 128GB of HBMe2 memory, offering 3.7TB/s of memory bandwidth and 96MB of onboard static RAM. This massive memory capacity and bandwidth supports processing large datasets efficiently, which is crucial for training and running large AI models.
High-speed, low-latency networking is critical when building clusters of accelerators to solve large training tasks. While Nvidia is building its accelerators using proprietary interconnects like its NVLInk, Intel is all-in on standard ethernet-based networking.
Gaudi 3 reflects this, featuring twenty-four 200Gb Ethernet ports, significantly enhancing its networking capabilities. This ensures scalable and flexible system connectivity, allowing for the efficient scaling of AI compute clusters without being locked into proprietary networking technologies.
Performance
Intel’s Gaudi 3 AI accelerator shows robust performance improvements across several key areas relevant to AI training and inference tasks, particularly for LLMs and multimodal models.
Intel projects that Gaudi 3 will significantly outperform competing products like Nvidia’s H100 and H200 in training speed, inference throughput, and power efficiency for various parameterized models.
Intel also predicts Gaudi 3 will deliver an average 50% faster training time and superior inference throughput and power efficiency against leading competitors for several parameterized models. This includes a greater inference performance advantage on longer input and output sequences.
Analyst’s Take
Intel’s Gaudi 3 AI accelerator is a strategic move by Intel to gain a greater position in the supply-hungry AI accelerator market, directly challenging Nvidia to address the burgeoning demand for advanced AI compute solutions.
Intel built a compelling solution, bringing substantial performance improvements over Gaudi 2 and delivering a solution that will challenge the market. The 4x AI compute for BF16, 1.5x increase in memory bandwidth, and 2x networking bandwidth improvements position the Gaudi 3 as a powerful solution for the needs of next-generation AI applications.
Intel’s emphasis on open community-based software and industry-standard Ethernet networking addresses critical market needs for flexibility and scalability without vendor lock-in. This approach differentiates Intel from Nvidia and aligns with the broader industry trend toward open standards and interoperability.
Intel’s partnerships with Dell Technologies, HPE, Lenovo, and Supermicro for the Gaudi 3 rollout set Intel up for success. If Intel can deliver the accelerators to the market on schedule and the promised performance claims hold, then Intel is poised to realize significant growth in the accelerator market. The same is also true for AMD and its MI300x accelerator.
Gaudi 3 isn’t just about the current generation of AI accelerators but also sets the stage for Intel’s next-generation GPU, Falcon Shores. By integrating the Intel Gaudi and Intel Xe IP with a single GPU programming interface, Falcon Shores is expected to further Intel’s capabilities in AI and HPC.
The launch of the Gaudi 3 AI accelerator is a significant milestone for Intel, highlighting its technological advancements, strategic market positioning, and commitment to addressing the evolving needs of the AI industry.
By offering substantial performance improvements, embracing open standards, and establishing strategic OEM partnerships, Intel is challenging the status quo in the AI accelerator market and positioning itself as a leader in the next wave of AI infrastructure.
Source: Intel’s Challenges Nvidia With Gaudi 3 AI Accelerator