Google Ironwood TPU Boosts AI Inference Performance

Scaling AI: Ironwood’s 9,216-Chip Clusters for Unprecedented Compute

Despite the complexities of benchmarking, the underlying message is clear: Ironwood stands as a major progression for Google’s artificial intelligence infrastructure. The robust foundation that powered Gemini 2.5’s rapid development now benefits from Ironwood’s improved speed and efficiency.

Google expects that Ironwood’s advanced inference capabilities and improved efficiency will lead to more transformative AI innovations during the next year. Ironwood stands ready to become a critical component of Google’s “age of inference” vision by delivering essential computational power that enables increasingly complex models and authentic agentic capabilities.

Decoding the Numbers: Ironwood’s Performance Context

Different benchmarking methodologies create complexity when performing direct comparisons of AI chips performance. The FP8 precision benchmark serves as Google’s main evaluation standard for Ironwood performance. We must approach the company’s claim that Ironwood “pods” provide 24 times the performance of comparable supercomputer segments with caution because many supercomputing systems lack native FP8 hardware support.

Google’s TPU v6, known as Trillium, was notably absent from their direct performance comparisons. According to Google, Ironwood provides double the performance efficiency for each watt of power that v6 delivers. The Google spokesperson explained that Ironwood succeeds the TPU v5p and Trillium follows the TPU v5e as a more powerful alternative. At FP8 precision, Trillium reached a maximum performance level of approximately 918 TFLOPS.

Inside Ironwood: A Performance Powerhouse

The new Ironwood processor provides a much stronger processing power output than any earlier Google TPU models. Deployment requires building large-scale liquid-cooled clusters that can accommodate up to 9,216 Ironwood chips each. The newly enhanced Inter-Chip Interconnect (ICI) facilitates seamless communication between massive computational resources to maintain high-speed and efficient data flow throughout the system.

Google’s internal AI research teams and developers using Google Cloud will benefit from this significant processing power. Ironwood will be offered in two configurations: The Ironwood system includes a 256-chip server designed for moderate AI workloads and a powerful 9,216-chip cluster optimized for processing the most complex AI challenges.

When fully configured, Ironwood pods operate they achieve a remarkable computational capacity of 42.5 Exaflops dedicated to inference computing. According to Google, each Ironwood chip delivers a maximum throughput of 4,614 TFLOPs, which marks a significant advancement from earlier TPU generations. The memory capacity for each Ironwood chip has expanded to 192GB, which signifies a sixfold enhancement from the Trillium TPU’s memory capacity. The memory bandwidth now stands at 7.2 Tbps after a 4.5x increase.

Google has just unveiled its latest innovation in custom silicon: Ironwood represents Google’s newest advancement as the seventh generation Tensor Processing Unit (TPU) architecture. The new chip serves beyond faster processing capacities and has been purpose-built to meet the complex needs of Google’s Gemini models while enabling their “thinking” capabilities which simulate reasoning according to Google.

The company continuously emphasizes its advanced AI models working in tandem with its purpose-designed infrastructure. Ironwood serves as a vital element of this approach with its promise to boost inference speeds significantly and expand the capability to process more contextual information in these advanced models. Google announces Ironwood as its most advanced TPU in terms of scalability and power, which will enable AI to independently collect data and generate results to help users in advance – this represents Google’s “agentic AI” concept and their proposed “age of inference.”