04. Stop Fixing Bugs by Guessing

2026-06-14 · 中文

04. Stop Fixing Bugs by Guessing

Superpowers Skills core skill · systematic-debugging

In one line: Don't fix bugs with "I think it's X, let me try." Find the root cause first — systematic is far faster than guessing (15-30 min vs 2-3 hours).

Scenario: the user paid, but the order didn't update

An intermittent production issue: payment succeeds, but the order is still stuck on "awaiting payment," and support gets complaints daily. It doesn't reproduce every time, and staring at the code gets you nowhere. For an "intermittent + multi-hop (pay -> callback -> order service -> DB)" bug, the worst move is to start guessing and patching.

How to ask the AI

Have it find the root cause, don't let it flail:

Use systematic-debugging on this intermittent bug: payment succeeds but the order status doesn't update.
Don't change code yet — read logs, find a way to reproduce reliably, log at each boundary (pay/callback/order service),
pin which layer breaks, then propose a fix.

❌ Weak	✅ Strong
Probably the callback's lost, add a retry	Pin which layer breaks first, then decide what to fix
Try changing a few spots and see	One change at a time; state your hypothesis
(3rd fix, still intermittent) try again	Still intermittent after 3 fixes — is the architecture wrong (e.g. no idempotency)?

Feed it "live evidence": error logs and the full trace of one failing order beat "it sometimes doesn't update" a hundred times over.

How it'll walk you through it

It reads logs and finds a way to reproduce (for intermittent bugs it hunts the trigger: concurrent orders? callback timeout?). It logs at each boundary (payment callback -> order service -> DB), runs once to see where the data breaks — e.g. "the callback arrived, but the order update got overwritten by another request" (a concurrent-write problem). With the root cause found, it first writes a failing test that reproduces it, then changes one thing (e.g. add an optimistic lock), and runs tests to confirm it's fixed without breaking anything.

Remember one line

systematic-debugging = make the AI find the root cause before acting. For intermittent/multi-hop bugs especially, don't let it guess-and-patch: have it log to pin the broken layer, and question the architecture after 3 failed fixes.