What retail tech leaders are talking about: Retail Jam 2026

Written by Steve Dennis | Jul 2, 2026 10:45:32 AM

A day at Knebworth Park. Two roundtables, dozens of frank conversations, and one big question hanging over the room: what happens to quality when AI is shipping code faster than your teams can review it?

Retail Jam is built differently to most events. No vendor hall to wander, no death-by-slide. Just curated conversations between retail leaders who turn up to talk honestly about what is working and what is starting to break.

It also happens to be set in a field at Knebworth Park, which during the heatwave meant the talk was frank and the sun was relentless.

Darryl, Mike and I hosted two sessions on what we have been calling the Tsunami of Code, then spent the rest of the day in the open marketplace talking to retailers at every stage of their journey. One thing became clear quickly: the room was split into two halves.

Two halves of the same room

In one half were the retailers dabbling in AI, often at early levels. Cautious experiments, a few agents writing production code, and a lot of careful questions about how to do it safely.

In the other were retailers still heavily focused on foundational work. Shopify migrations that need to run at speed, daunting PIM overhauls, integration layers that nobody fully owns. The unglamorous change that decides whether a business can move at the pace it needs.

These are the same industry at different speeds, building on different foundations. And what connected both halves was a single recurring theme: trust. Who owns the work, who is accountable for it, and how do you stay confident in quality when the ground keeps shifting.

The AI half: confidence is the real question

The retailers experimenting with AI were not asking whether it works. They were asking whether they could trust it, and who carries the risk when they cannot.

The questions were strikingly consistent:

How do you hold an outsourcer accountable when AI raises the bar on what good looks like?

Who reviews the code the agent wrote?

What does go-live even mean when part of your tech was built by a model?

These are governance questions before they are technical ones, and most organisations do not yet have clean answers.

A few things stuck with us. When code is generated faster than anyone can read it, the temptation is to trust the output because it runs, and that is exactly the trap. Our piece on the seductive peril of the black box covers why "it works" was never sufficient justification, and why that matters far more now a model is doing the writing.

Ownership is the other half of trust. Speed at the point of generation means very little if accountability gets fuzzy downstream. AI concentrates responsibility, often onto the senior engineers who are already the bottleneck. The same dynamic shows up when system integrators end up marking their own homework: the faster code arrives, the more it matters that someone independent is checking it.

The traditional half: foundations still decide everything

For all the AI talk, plenty of familiar work is still very much live, and for a significant number of retailers it is the main event. They are not AI-first yet. They are still becoming platform-ready.

The pressure to move fast on Shopify came up repeatedly, and so did the gap between how simple it looks and how hard it really is. We have written about exactly this in the Shopify project that's harder than it looks. The front end is the easy part but the back end and the integrations are where speed quietly turns into risk.

And whatever else changes, end-to-end QA, performance and UAT remain the big hurdles. The parts that break first in production are the same parts that get assumed to work rather than verified: integrations, third-party dependencies, data flows.

UAT in particular stays painful because it is treated as a gate rather than a process, which is why we have been digging into why UAT hurts and what good actually looks like.

Where the two halves meet: the Graded Test Approach

The thread running through both halves is the same. Whether your risk comes from a model writing code or a migration nobody has fully scoped, the question is where your risk actually sits and how much rigour each part deserves.

That is exactly what we introduced our Graded Test Approach for, and it landed with the right people. It sparked the debate we hoped for: not "should we test more?" but "how do we test proportionately when the volume and origin of code has changed?" When a model is part of the build, testing before coding stops being a nice-to-have and becomes the thing that keeps you honest.

What this means in practice

Essentially, AI has made quality harder to locate. When code can be generated faster than teams can review, test and own it, the discipline that protects you is not more speed. It is clear accountability, proportionate testing, and a genuine understanding of where your risk lives.

This is an industry embracing AI at different speeds, with different foundations. The job is to build the right guardrails for the stage you are actually at.

We have pulled the full set of takeaways from the day, including what we heard on outsourcing, data governance and the senior engineer bottleneck, into a more detailed write-up.

View full post