Amazon’s Resilient Network Graphs (RNG) Boost AWS Data Center Throughput by 33% and Cut Energy Use by 40%

Amazon’s New AWS Routing Architecture Aims to Slash Data Center Energy Consumption by 40%

Amazon has begun deploying a new routing architecture called Resilient Network Graphs (RNG) in its AWS data centers to improve energy efficiency and network performance. According to Amazon, this new design delivers 33% better throughput while utilizing 69% fewer routers, a shift that is projected to reduce network infrastructure electricity consumption by 40%.

The transition marks a significant departure from the hierarchical “fat tree” topologies that have served as the industry standard for decades. By moving toward a “quasi-random” model, Amazon’s AWS Networking Lab researchers aim to address the mounting power and cooling constraints facing hyperscale cloud providers as they scale to meet the demands of artificial intelligence and massive-scale data processing.

Why AWS is Moving Away from Traditional Fat Tree Topology

For much of the last two decades, data center design has relied on the “fat tree” topology. Originally utilized in supercomputing during the 1990s and widely adopted throughout the 2000s, the fat tree model scales well by using a layered, hierarchical structure. In this setup, switch-router infrastructure is organized in tiers, and data packets move up and down these layers to find the shortest possible path to their destination.

Why AWS is Moving Away from Traditional Fat Tree Topology

However, as data center networks expand to accommodate modern bandwidth requirements, the fat tree model faces inherent scalability issues. As the network grows, it requires an ever-increasing amount of switch and cabling infrastructure to maintain throughput. Amazon researchers note that this often forces designers to make compromises to manage costs, which can lead to higher network congestion.

The Resilient Network Graphs (RNG) architecture offers a different approach based on random graph theory. Unlike the rigid hierarchy of a fat tree, a random graph topology functions as a flat mesh where switches connect to one another more randomly. This design is theoretically more efficient and provides superior fault tolerance. Amazon researchers explained that in this model, “No single router is more important than any other. The loss of 1% of routers results in a roughly 1% capacity loss.”

While the benefits of random graph topologies have been discussed in academic circles for years—such as the Jellyfish project proposed by researchers in 2012—implementing them in a massive commercial data center has historically been difficult. The primary obstacles include the extreme complexity of cabling switches across varying distances and the requirement for each node to maintain massive routing tables to track every possible path in memory.

The Technology Behind Resilient Network Graphs: Spraypoint and ShuffleBox

To overcome the practical limitations of pure random graph topologies, Amazon researchers developed a “quasi-random” compromise. This hybrid approach combines the efficiency of random graphs with the manageable structure of traditional hierarchies through two primary innovations: the Spraypoint algorithm and the ShuffleBox device.

The Technology Behind Resilient Network Graphs: Spraypoint and ShuffleBox

The Spraypoint algorithm manages how data moves through the network. Instead of following a strictly linear path through hierarchical layers, traffic is randomly “sprayed” to neighboring switches. This provides the data with a wide selection of possible paths to its destination. As the packets approach their final target, the system shifts to a conventional shortest-path algorithm using “waypoint” switches to ensure precision and efficiency.

Amazon Route 53 Explained | AWS DNS, Routing, Health-Check and Failover

The second major innovation is the ShuffleBox. One of the biggest deterrents to non-hierarchical networking has been the “impossibly complex” cabling required to connect switches in a mesh. The ShuffleBox is a specialized data center device designed to concentrate this complex wiring into a single unit. This allows for the random interconnection of switches without the need for long, difficult cable runs across the data center floor.

According to Amazon, these innovations allow for more resilient infrastructure behind every database query, API call, and machine learning training job, all without requiring customers to change a single line of code. The company confirmed that the first quasi-random network carrying real production traffic went live near Dublin, Ireland, at the end of 2024. Following that deployment, Amazon validated performance against mathematical predictions and applied refinements to two additional deployments.

Key Technical Specifications of the RNG Architecture

  • Throughput Improvement: 33% better throughput compared to traditional models.
  • Hardware Efficiency: 69% fewer routers required for the same workload.
  • Energy Impact: A projected 40% reduction in network infrastructure electricity consumption.
  • Deployment Status: Default architecture for most new AWS data centers since April.

Will the Industry Adopt Amazon’s Proprietary Design?

While the efficiency claims are significant, the proprietary nature of the RNG architecture suggests that its immediate influence on the broader cloud industry may be limited. Because AWS designs much of its own networking equipment, the cost and complexity of such a redesign are currently borne by Amazon alone.

Will the Industry Adopt Amazon's Proprietary Design?

Amruth Laxman of the cloud VoIP provider 4Voice noted that while the development proves that random graph features can indeed be integrated into functional data center networks, other companies may struggle to follow suit. “The big question at this point is how flexible they have made their design,” Laxman said. He pointed out that most hyperscale customers cannot afford to absorb the massive redesign costs that AWS can. Because re-equipping existing data centers with such radical new technology would be prohibitively expensive, Amazon is currently only implementing RNG in its new builds.

Despite these limitations, industry experts see the move as a necessary response to the growing environmental pressures facing the tech sector. Ryan Ries, chief AI and data scientist at the AWS consultancy and MSP Mission Cloud, noted that the industry is facing increasing pushback regarding data center expansion. These concerns are primarily tied to energy demand, water usage, and the impact on local communities.

“Power and water performance have become two of the most important issues facing cloud providers today,” Ries stated. He added that the efficiency claims made by AWS appear credible specifically because the RNG architecture is already in production and has become the default for most new global builds.

As cloud providers continue to race toward sustainability targets, the success of Amazon’s quasi-random approach may set a new benchmark for how hyperscalers balance the massive power requirements of AI with the necessity of environmental responsibility.

Amazon has not yet provided a scheduled date for the next official update regarding the expansion of the RNG architecture to existing facilities. We will continue to monitor official AWS announcements for further developments in data center networking technology.

What do you think about the shift toward quasi-random networking in the cloud? Share your thoughts in the comments below and share this article with your network.

Leave a Comment