Stop testing everything the same way. Your systems are unpredictable but your testing philosophy doesn't need to be.

Written by Darryl Kennedy | Jun 23, 2026 1:00:01 PM

Composable commerce, third-party APIs, real-time pipelines, AI-driven logic... there's an infinite number of possible states so why are retail tech teams still applying the same testing philosophy across everything?

Retail engineering teams have done far more than just accumulate complexity. They've now become unpredictable and traditional QA can't keep up.

The result is a growing blind spot. Systems that look stable in staging fail unexpectedly in production. Edge cases aren't rare, they're the system. And the cost isn't just defects; it's slower shipping, constant firefighting and a quiet erosion of trust in every release.

Teams need a shared model for how confidence should scale with risk and we believe our Graded Test Approach is the answer.

"The big question becomes: How do we apply the right level of scrutiny, visibility and resilience based on where this system sits on the Graded Test Approach?"

Darryl Kennedy

The Graded Test Approach at a glance

Stage	Stage name	Signal
G0	Personal Utility	"If it breaks, I shrug"
G1	Shared Team Tool	"If it breaks, someone pings me"
G2	Internal Dependency	"If it breaks, work stops"
G3	Customer-Facing	"If it breaks, customers notice"
G4	Mission Critical	"If it breaks, we lose money or trust"

Not all failures carry the same consequence

A G1 internal tool breaking is an inconvenience but a G4 checkout flow breaking is a revenue event. Treating them with the same testing philosophy is how teams either over-engineer too early or under-protect what actually matters.

Our Graded Test Approach makes something explicit that most teams handle implicitly: different systems deserve different levels of testing, and different failures carry different consequences.

As a service moves from G0 to G4, the expectation shifts to a fundamentally different release posture, observability requirement and evidence of control.

At G3, you need a go/no-go decision with the right people reviewing test evidence. At G4, it's something closer to a formal change board: because if you've done a major marketing push and gone live with an enterprise-wide release, pulling it back is a business decision as much as a technical one.

Precision over completeness

In this new world, exhaustive testing is a false goal. Instead, you've got to question what failure means at each stage, and what the team is willing to tolerate. That's a risk judgement, and the Graded Test Approach gives it a structure.

For a CTO, the shift is subtle but important. Testing becomes a risk-weighted system of evidence with proportionate scrutiny, proportionate visibility, proportionate resilience. You need a common language for making decisions that scale with stakes, without creating bureaucracy for low-risk work or cutting corners on what genuinely matters.

Control comes from being explicit about confidence at each grade, before anything ships.

"Precision about what failure means at each stage is how teams regain control. Confidence scales with risk or it means nothing."

Steve Dennis

Where this fits in delivery

The Graded Test Approach sits alongside our FLOW methodology. FLOW governs how work moves through a team while the Graded Test Approach governs how much confidence is required before it does.

Together, they give retail technology teams a proportionate, evidence-based delivery system, one that scales with the complexity of modern composable environments without slowing down the work that doesn't need it.

View full post