HolyGhost logoHolyGhost
← cd ..
Analysis

Race Conditions: Two Requests, One Instant, and a Broken Assumption

When the outcome of a system depends on split second timing, an attacker who controls the timing can bend the rules. A plain guide to race conditions and TOCTOU flaws.

HolyGhost··8 min read

Imagine a gift card worth fifty dollars, and a shop till that checks the balance before letting you spend. Normally this is airtight: spend forty, ten remains, the maths always adds up. Now imagine you could walk up to ten tills at the exact same instant and hand each one the same card. Every till checks the balance, sees fifty dollars, and approves a forty dollar purchase, all before any of them has finished subtracting. You walk out with four hundred dollars of goods on a fifty dollar card. Nobody broke the rule "you must have enough balance". They just all checked at the same moment, before the answer could change.

That is a race condition, and it is one of the more mind bending classes of vulnerability because nothing in the code is obviously wrong. The flaw lives in timing. This is a plain guide to how race conditions happen, the classic forms they take, and how to design them away.

Scope

This is a defensive explainer of a well understood class of flaw, for people building and protecting systems. Only test against systems you own or are authorised to assess.

The core idea: correctness that depends on timing

A race condition is a flaw where the correct behaviour of a system depends on the order or timing of events, and an attacker who can influence that timing can force an outcome that should never happen. The name paints the picture: two operations are racing, and the result depends on who gets there first.

At the heart of almost every race condition is a pattern called check then act. The program checks that something is true, and then, as a separate step, acts on it:

1. CHECK:  is there enough balance?      (yes, 50 dollars)
2. ACT:    approve the purchase and subtract.

Between step one and step two there is a tiny gap. It might be microseconds. The hidden assumption is that nothing changes in that gap. A race condition is what happens when an attacker crams another operation into it, so that the thing you checked is no longer true by the time you act, or so that many actions all pass the same check before any of them updates the state.

Spot the pattern: check then act

Whenever you see code that checks a condition and then, in a separate step, does something based on that check, ask the killer question: what if the state changes between the two? If the answer is "something bad", and if an attacker can trigger the change, you may have a race condition. The gap between check and act is the entire attack surface.

The file system version: TOCTOU

The classic academic form of this bug has a name worth knowing: TOCTOU, which stands for time of check to time of use. It shows up when a program checks something about a file and then uses that file as two separate steps.

Picture a program that runs with high privileges and wants to be careful. It checks "is the user allowed to access this file?", and once satisfied, it opens the file. In between, the attacker swaps what the filename points to, often using a symbolic link (a symbolic link, or symlink, is a special file that acts as a pointer to another file). So the program checks a harmless file it is allowed to touch, but by the time it opens "the same" name, that name now points at a sensitive system file. The check passed on one file, the use happened on another.

1. Program checks: is /tmp/userfile safe to open?   (yes)
2. Attacker swaps /tmp/userfile to point at a secret file.
3. Program opens /tmp/userfile ... and reads the secret.

This exact shape, verify one thing and then use a different thing, is not limited to files. It is the same time of check to time of use gap that underlies the WinRE BitLocker bypasses, where the boot process validated one recovery image and then loaded another. Once you know the pattern, you see it across wildly different systems.

The web version: doing something many times at once

On the web, race conditions usually take the gift card shape from the opening. Modern servers handle many requests at the same time, in parallel, which is normally a good thing. But it means that if an attacker fires the same sensitive request many times simultaneously, several copies can all sail through a check before any of them finishes updating the state.

Common real world targets:

  • Redeeming a discount code or voucher that is meant to be used once, but gets accepted many times because all the requests check "is it unused?" together before any marks it used.
  • Withdrawing funds or transferring money, so that several withdrawals each pass the balance check and the account goes negative. This is often called a limit overrun or a double spend.
  • Claiming a limited resource, such as the last item in stock or a one per customer offer, more times than allowed.
  • Account and registration quirks, where doing two things at once leaves the system in a state it never expected.
Attacker sends 20 identical "withdraw 50" requests at once.
   request 1 checks balance (100) -> ok
   request 2 checks balance (100) -> ok        (still 100, none finished)
   ... all 20 check 100 and approve ...
   result: 1000 withdrawn from a 100 balance.

The attacker's only tool is timing. They are not forging anything or injecting code. They are exploiting the assumption that requests happen one after another, when in reality they can happen all at once.

It often hides behind correct looking code

The individual line "check the balance, then subtract" reads as perfectly sensible. That is what makes race conditions so easy to ship and so easy to miss in review. The bug is not visible in any single path through the code. It only appears when two or more copies run at the same time, which ordinary testing, one request at a time, will never reveal.

Designing them away

The fix is not to check more carefully. It is to remove the gap between check and act, so there is no instant for an attacker to slip into. The operation must become atomic, meaning it happens as one indivisible step that cannot be interrupted or interleaved.

  1. Make the operation atomic. Combine the check and the change into a single step that the system guarantees will not be split. "Subtract 50 only if the balance is at least 50" performed as one atomic operation cannot be raced, because there is no moment between checking and subtracting.
  2. Use database transactions and locking. Databases are built for exactly this. Wrapping the read and the update in a transaction, and locking the relevant row while you work on it (for example with SELECT ... FOR UPDATE), forces competing requests to wait their turn instead of all charging in together.
  3. Let the database enforce uniqueness. A unique constraint (a rule that a value can only appear once) means the database itself rejects the second attempt to, say, redeem the same voucher, no matter how perfectly the timing is lined up.
  4. Use idempotency keys. Attaching a unique key to a request so that repeats are recognised and only counted once turns a flood of identical requests into a single effect.
  5. Lock around critical sections. Where the work is not in a database, a lock or mutex (a mechanism that lets only one operation into a section of code at a time) serialises access so two copies cannot overlap.

The unifying idea is simple: never let the world change between the moment you check and the moment you act.

The takeaway

A race condition is a flaw where correctness depends on timing, and an attacker who controls the timing can force an impossible outcome. Its heart is the check then act pattern, with a gap between the two that should not exist. On the file system it appears as TOCTOU, where a checked file is swapped before it is used. On the web it appears as firing a sensitive request many times at once so several pass a check before any updates the state, draining balances or reusing single use codes. The defence is not more careful checking but atomic operations, database transactions and locking, unique constraints, and idempotency keys, all of which close the gap. Remove the instant between check and act, and the race has nowhere to run.