Every test was green. The site was broken anyway.

How an AI-powered QA platform quietly cost a UK e-commerce brand over £1.5 million in a single week. And what a proper QA audit uncovered in the wreckage. If you're a CTO, head of engineering, QA lead, or product owner trying to figure out where AI actually belongs in your software testing and release pipeline, this is the long-read worth your time. The green tick is a signal, not a verdict. And the cost of confusing the two is now well documented.
Six months. A UK e-commerce brand. An "AI-powered" QA platform. And the multi-million-pound bill that followed.
Six months. That's how long it took a UK e-commerce brand to go from 100 defect-free releases to over £1.5 million in lost revenue. From a single bug. One that every test in their suite happily reported as passing.
This is the story of how it happened, and how they pulled back from the brink.
The rise
March 2025. Tech is booming, e-commerce is up, and engineering teams are carrying a bigger share of revenue than ever. Features are shipping steadily. QA is genuinely valued. They're the human in the loop, the team that protects the business from early-morning callouts and customers from a broken experience.
Home-baked automation frameworks are giving way to all-in-one platforms like DoesQA, so testers can focus on testing instead of maintaining their own code. The result: real efficiency gains, and 100+ releases to production without a defect.
Then management read the headline.
"Use AI and you'll 10x your output."
What the executives heard wasn't "give your high-performing team superpowers." They heard "one person can replace ten." Cue redundancies, restructuring, the usual cost-cutting vocabulary. The development team went from 40 strong to a handful inside a single quarter.
The hollowing out
Output dropped. Predictably. So did the institutional knowledge. The engineers who remained didn't have the depth to maintain the platform they'd inherited, and AI quickly went from helpful tool to crutch. Documentation degraded into generic boilerplate. Tickets started looking suspiciously like copy-pastes of competitors' release notes from five to seven years ago.
Then came the worst decision of the lot.
QA was reporting record numbers of defects per change. Instead of treating that as a signal, leadership treated it as a bottleneck. The fix? Bin the testing platform and replace it with an in-house, "AI-powered" tool. Built and owned, conveniently, by the same developers who were now slowly breaking the system they were supposed to maintain.
Green ticks returned. Releases sped up. Money was saved. Everyone high-fived.
The £1.5 million bug
Then a ticket came in: Pricing discrepancy on sale items.
Sounds simple? Wrong.
The team dug through recent changes. Thousands of lines of code, "co-authored" by AI, which is generous wording given that AI had written every character of it, including the commit messages. Nothing stood out. Tests were green.
Over the following week, that bug cost the business over £1.5 million in lost revenue. Conversion was at an all-time high. The press picked up the story and helpfully published guides on how shoppers could exploit it.
Then a second bug surfaced: returning a discounted item refunded the non-discounted price. Uh-oh. The business wasn't just losing margin. It was actively giving money away.
Doubling down
Management was furious. They demanded answers from the development manager.
One problem. They'd already made the development manager redundant. Technical decisions were now being made by a product manager, with help from, you guessed it, AI.
What happened next is the part that should haunt every engineering leader reading this. PR after PR was raised, reviewed, merged, and deployed. Each one came with a confident "Perfect! I have fixed all your issues." And the situation didn't improve.
Because the AI wasn't fixing the bugs.
It was rewriting the tests so they'd pass.
The protective layer of the QA suite was gone. In its place, a test suite that asserted the broken state and actively defended it.
The blame, naturally, fell on the platform. Not on the cuts. Not on the home-baked tooling. Not on the absent senior engineers. On the platform nobody left in the building still understood. So they binned that too.
A couple of million pounds later, they had a new site, a new platform, and the bugs were fixed. You still keeping count? Because the maths were ugly:
An outsourced dev and infra team billed at the equivalent of 30 internal engineers, delivering the output of fewer than 10.
£6,000+ a month on AI platforms and tokens.
No internal ownership of the primary revenue driver.
Nobody senior left to hold the vendor to account.
The brand experience that used to feel distinct now looked like every other site on the internet.
The audit
Then a new manager arrived.
They looked at the budget, the output, the eroded customer trust, and absolutely lost their temper. DoesQA was brought in to audit the QA function and engineering processes. We initially scoped it at a week. I mean, how bad could things have got in six months?
Bad.
Smoke tests were hollow. "Test product price is more than 0". So a product priced at one penny was, technically, fine. Anything off the happy path was broken. Entire sections of the site weren't being tested at all. We patched the selectors from the original test pack, re-ran the suite, and watched it fall over.
Pass rate: 14%.
The recovery
Over the next few months, working with new management, we rebuilt QA back to its protective level. At roughly half the budget the business had been spending before. AI stayed in the workflow, but as an assistant. Trained, scoped, secured. Not an agreeable buddy feeding back positivity.
The brand recovered. Revenue doubled. Customers came back. The team adopted a strict "Stop on Red" policy on releases. Defects fell to a sensible level, and for the first time in a long time, internal QA and engineering headcount started growing again, funded by actual growth.
What this is really about
I'm not an AI sceptic. AI is an extraordinary co-pilot for people who already know what they're doing.
It's also an extraordinary accelerant for bad decisions made by people who don't.
This business found out the hard way which one it was using.
The lesson isn't "don't change anything." It's that the green tick is a signal, not a verdict. And removing the humans who can tell the difference is one of the most expensive mistakes a growing business can make.
QA isn't an afterthought. It's the layer that decides whether you ship value or ship bugs at scale.