Cerebras on AWS: What Faster AI Means for QA and Testing

Rincy John

Key Takeaways:
The world’s fastest AI inference is now available through Amazon Bedrock. For this, Cerebras Systems has already deployed its powerful CS-3 chips in AWS data centers. Through a new disaggregated architecture that combines AWS Trainium chips with the Cerebras WSE, it is possible to transfer tokens at 5 times higher speed using the same existing hardware infrastructure. While typical graphics chips (GPUs) deliver only hundreds of tokens per second, Cerebras can produce AI output at speeds of up to 3,000 tokens per second. Compared to simple chatting, AI tools used for coding generate nearly 15 times more tokens each time, and this volume is expected to increase further in the coming days.

Key Takeaways:

The world’s fastest AI inference is now available through Amazon Bedrock. For this, Cerebras Systems has already deployed its powerful CS-3 chips in AWS data centers.
Through a new disaggregated architecture that combines AWS Trainium chips with the Cerebras WSE, it is possible to transfer tokens at 5 times higher speed using the same existing hardware infrastructure.
While typical graphics chips (GPUs) deliver only hundreds of tokens per second, Cerebras can produce AI output at speeds of up to 3,000 tokens per second.
Compared to simple chatting, AI tools used for coding generate nearly 15 times more tokens each time, and this volume is expected to increase further in the coming days.

On March 13, 2026, that announcement was made. It confirmed something that those in the artificial intelligence (AI) infrastructure space had already been expecting: through a blog post, Cerebras Systems announced that it is now coming to Amazon Web Services (AWS).

But for software testers watching this move closely, it indicates bigger changes ahead.

When Speed Meets Power

Through a decade of effort, Cerebras Systems has already earned the reputation of building the world’s fastest AI inference system. The strength of its CS-3 chip (Wafer-Scale Engine: WSE-3) lies in its design: the model weights are placed entirely within on-chip SRAM. This approach enables memory bandwidth that is thousands of times higher than even the best GPUs available today. As a result, it can deliver inference speeds of up to 3,000 tokens per second. Leading players like OpenAI and Meta are already utilizing this speed.

Now that speed is making its way to AWS, the world’s largest cloud platform. Through Amazon Bedrock, alongside leading open-source models and Amazon’s own Nova models, Cerebras services will be powered by CS-3 systems deployed in AWS data centers.

But that’s not the real highlight of this news. What truly stands out is the new Disaggregated Inference Architecture working behind the scenes, which is capturing the attention of the tech world.

The Architecture that Often Goes Unnoticed

Every AI inference process has two main stages: the Prefill stage, where the input is processed, and the Decode stage, where the output is generated. Most current chips handle both of these tasks in the same place. While this is technically feasible, it often leads to compromises in performance.

The Cerebras-AWS partnership, however, deliberately breaks away from this approach. Amazon’s own Trainium chips will now handle only the Prefill stage. The resulting data (KV cache) is then transferred via Amazon’s Elastic Fabric Adapter to Cerebras’s WSE chips, where the Decode phase is completed at high speed.

As AWS Vice President David Brown puts it,

“By splitting inference workloads between Trainium and CS-3, each system can operate at its best. This approach can deliver high speed and significantly better performance than what exists today.”

— David Brown, Vice President, Compute & ML Services, AWS

With this shift, it’s possible to increase token capacity by up to five times using the same hardware. This isn’t just a small step forward. It’s a structural change that could reshape the AI landscape.

Why Testers Need to Pay Attention

There are a few critical aspects left unspoken in Cerebras’s blog post that every Quality Assurance (QA) team should take seriously.

Agentic coding is no longer a general idea. We are entering a phase where AI writes code, reviews pull requests, fixes bugs, and manages automated pipelines on its own. According to Cerebras, such workflows generate approximately 15x more tokens than regular chat interactions. That points to a massive surge in inference demand. This shift is already becoming visible in the development environments feeding into your test pipelines.

When AI operates faster, code gets generated faster. And as code volume grows, so does the frequency of deployments. This puts unusual pressure on your test coverage, regression strategies, and automation frameworks.

Keeping up with development speed has always been one of the toughest challenges in testing. The Cerebras-AWS partnership only accelerates that race further.

The question QA teams need to ask is no longer, “How fast is our AI?” but rather, “How fast is our testing?”

If AI is generating code at 3,000 tokens per second, can your test suites validate it at the same pace? Can your automation layer adapt to changes that didn’t exist yesterday? If an AI-generated update alters the UI or API, do your test cases have the ability to self-heal?

Manual testing cannot keep up with this. Neither can fragile automation systems that rely solely on static scripts.

A Few Precautions

As speed increases, so do the risks. This is not just a theoretical concern.

When inference becomes this fast and accessible, the volume of AI-generated code entering production pipelines will rise significantly. Every piece of that code needs to be properly validated. Without equally fast and intelligent testing to match rapid code generation, it can lead to technical debt and failures that are hard to detect through conventional testing methods.

There are also valid concerns about the reliability of AI-generated outputs. Both Cerebras and AWS point out that this new inference approach is best suited for large-scale, stable workloads. In other words, different use cases will demand different configurations. Just as infrastructure needs to adapt, testing strategies must also evolve to handle this diversity.

Only organizations with testing approaches that can accommodate rapid experimentation and unexpected changes will truly benefit from this high-speed AI inference. The rest may simply end up releasing code faster and failing even faster.

How Will this Impact in Practice?

High-speed AI inference can either become a major advantage for your business or a serious risk. It all depends on your testing infrastructure. Consider the following two scenarios:

Scenario	Without Adaptive Testing	With Adaptive Testing
AI generates 100 new test cases overnight	Delays due to manual review	Automated validation completes instantly
AI agent introduces changes in the UI	Scripts break, tests fail	Changes are absorbed through self-healing
AI adds a new API endpoint	Gaps in test coverage go unnoticed	Dynamic test generation covers new paths
Pre-deployment regression testing	Takes hours to complete	Prioritized and accelerated using AI risk signals

The difference between these two columns is the essence of the transformation brought by the Cerebras-AWS partnership. Depending on your testing approach, this could either become a powerful advantage or a persistent headache.

Are you currently using any specific tools or automation frameworks in your team to handle such changes?

AI Moves Faster Than Ever: Testing Must Too

What message does the collaboration of Cerebras and AWS convey? It conveys the extraordinary speed that AI-driven software development is about to achieve and the new infrastructure evolving to support it.

For businesses, AI inference capacity in the cloud is rapidly becoming a defining factor of performance. For development teams, the tools they use will soon operate at speeds that once seemed unimaginable. But for QA professionals, this sends a clear message. The upstream processes leading into your testing phase have already undergone a massive upgrade.

The key question now is this. Are your testing practices ready to keep up with this acceleration?

Speed without proper validation only shortens the path to failure. Only those who recognize this and invest in testing infrastructure capable of matching AI-driven development will be able to turn this shift into a real advantage.

If you are considering evolving your testing strategy to align with this new wave of AI, testRigor is a strong place to start exploring.

You're 15 Minutes Away From Automated Test Maintenance and Fewer Bugs in Production

Simply fill out your information and create your first test suite in seconds, with AI to help you do it easily and quickly.

	Achieve More Than 90% Test Automation
	Step by Step Walkthroughs and Help
	14 Day Free Trial, Cancel Anytime

“We spent so much time on maintenance when using Selenium, and we spend nearly zero time with maintenance using testRigor.”

Keith Powe VP Of Engineering - IDT

Start testRigor Free

Request a Demo