A day at Knebworth Park. Two roundtables, dozens of frank conversations, and one big question hanging over the room: what happens to quality when AI is shipping code faster than your teams can review it?
It also happens to be set in a field at Knebworth Park, which during the heatwave meant the talk was frank and the sun was relentless.
Darryl, Mike and I hosted two sessions on what we have been calling the Tsunami of Code, then spent the rest of the day in the open marketplace talking to retailers at every stage of their journey. One thing became clear quickly: the room was split into two halves.
In one half were the retailers dabbling in AI, often at early levels. Cautious experiments, a few agents writing production code, and a lot of careful questions about how to do it safely.
In the other were retailers still heavily focused on foundational work. Shopify migrations that need to run at speed, daunting PIM overhauls, integration layers that nobody fully owns. The unglamorous change that decides whether a business can move at the pace it needs.
These are the same industry at different speeds, building on different foundations. And what connected both halves was a single recurring theme: trust. Who owns the work, who is accountable for it, and how do you stay confident in quality when the ground keeps shifting.
The retailers experimenting with AI were not asking whether it works. They were asking whether they could trust it, and who carries the risk when they cannot.
The questions were strikingly consistent:
How do you hold an outsourcer accountable when AI raises the bar on what good looks like?
Who reviews the code the agent wrote?
What does go-live even mean when part of your tech was built by a model?
These are governance questions before they are technical ones, and most organisations do not yet have clean answers.
A few things stuck with us. When code is generated faster than anyone can read it, the temptation is to trust the output because it runs, and that is exactly the trap. Our piece on the seductive peril of the black box covers why "it works" was never sufficient justification, and why that matters far more now a model is doing the writing.
Ownership is the other half of trust. Speed at the point of generation means very little if accountability gets fuzzy downstream. AI concentrates responsibility, often onto the senior engineers who are already the bottleneck. The same dynamic shows up when system integrators end up marking their own homework: the faster code arrives, the more it matters that someone independent is checking it.
For all the AI talk, plenty of familiar work is still very much live, and for a significant number of retailers it is the main event. They are not AI-first yet. They are still becoming platform-ready.
The pressure to move fast on Shopify came up repeatedly, and so did the gap between how simple it looks and how hard it really is. We have written about exactly this in the Shopify project that's harder than it looks. The front end is the easy part but the back end and the integrations are where speed quietly turns into risk.
And whatever else changes, end-to-end QA, performance and UAT remain the big hurdles. The parts that break first in production are the same parts that get assumed to work rather than verified: integrations, third-party dependencies, data flows.
UAT in particular stays painful because it is treated as a gate rather than a process, which is why we have been digging into why UAT hurts and what good actually looks like.
The thread running through both halves is the same. Whether your risk comes from a model writing code or a migration nobody has fully scoped, the question is where your risk actually sits and how much rigour each part deserves.
That is exactly what we introduced our Graded Test Approach for, and it landed with the right people. It sparked the debate we hoped for: not "should we test more?" but "how do we test proportionately when the volume and origin of code has changed?" When a model is part of the build, testing before coding stops being a nice-to-have and becomes the thing that keeps you honest.
Essentially, AI has made quality harder to locate. When code can be generated faster than teams can review, test and own it, the discipline that protects you is not more speed. It is clear accountability, proportionate testing, and a genuine understanding of where your risk lives.
This is an industry embracing AI at different speeds, with different foundations. The job is to build the right guardrails for the stage you are actually at.
We have pulled the full set of takeaways from the day, including what we heard on outsourcing, data governance and the senior engineer bottleneck, into a more detailed write-up.