Distributed Locks

Updated June 3, 2026

Magic Magnets Team

7 min read

The Problem with Local Locks

If you write a multi-threaded application on a single machine, you know about "locks" (or mutexes). If two threads want to update the same variable, Thread A grabs the lock, updates the variable, and releases the lock. Thread B waits its turn.

But what happens when you scale to a distributed system with 10 different servers? A lock in the memory of Server A does nothing to stop Server B from modifying the same data in your database.

Analogy: Think of a shared office space with a single bathroom. If you just lock the door to your personal cubicle, anyone can still walk into the bathroom. To prevent an embarrassing situation, you need a single, physical key to the bathroom that hangs at the front desk.

Distributed Locks are that single key at the front desk. They ensure that across your entire cluster of servers, only one server can access a shared resource or perform a specific action at a given time.

The Core Concept

A distributed lock needs a centralized place that all servers can check. When a server wants to do something critical (like process a payment, or generate a daily report), it asks this central authority: "Can I have the lock?"

If yes, it does the work, then tells the authority, "I'm done, release the lock."
If no, it waits or aborts the operation.

The Anatomy of a Distributed Lock

A naive way to build this is to use a database table: INSERT INTO locks (resource_name) VALUES ('daily_report'). But what if the server crashes right after inserting the row? The lock is held forever, and the report never gets generated again.

A distributed lock needs a few key features:

Mutual Exclusion: Only one client can hold the lock at a time.
Deadlock Free: Even if the client holding the lock crashes, the lock must eventually be released (usually via a timeout/TTL).
Fault Tolerance: The locking service itself shouldn't be a single point of failure.

How to Implement Distributed Locks

algobase.dev

A distributed lock uses a shared coordination service (Redis, etcd, ZooKeeper) as the single source of truth about who holds the lock. Redis's SETNX command (Set if Not eXists) is atomic: only one server wins the race. The winner attaches a TTL so the lock self-releases if the holder crashes without explicitly releasing it. Server B's SETNX returns 0 immediately, so it can retry after a backoff or abort rather than blocking. This prevents duplicate payment processing, double cron executions, and other exactly-once semantics violations.

1 / 1

Distributed lock — Redis SETNX, TTL, Server B blocked while A holds lock

1. Redis (The Fast Way)

Redis is commonly used for distributed locks because it's incredibly fast. You can use the SETNX command (Set if Not eXists) along with an expiration time.

The danger: If you have a single Redis node and it crashes, you lose your locks. To solve this, Redis created the Redlock algorithm, which requires grabbing the lock across a majority of independent Redis nodes.

Quiz Time

Why does ZooKeeper use "ephemeral nodes" for distributed locks instead of regular database rows?

2. ZooKeeper / etcd (The Safe Way)

Systems like ZooKeeper and etcd are strongly consistent distributed key-value stores. They are built for exact coordination tasks like this.

They use "ephemeral nodes" (files that exist only as long as the client is connected). If a server grabs a lock and then crashes, its network connection drops, and ZooKeeper instantly deletes the lock, freeing it up for others. This is much safer than relying purely on time-based expirations.

The "Fencing Token" Problem

algobase.dev

TTL-based locks have a subtle failure mode: Server A acquires the lock with token #1, then pauses for a GC cycle that lasts longer than the TTL. The lock expires. Server B acquires it with token #2 and starts writing. Server A resumes, still thinks it holds the lock, and tries to write. Without fencing, this corrupts data. The fix is the fencing token: the lock service issues a monotonically increasing integer with each grant. The database tracks the highest token it has seen and rejects any write carrying a lower number. Server A's stale token #1 is rejected because the database already accepted token #2 from Server B.

1 / 1

Fencing token — stale lock holder rejected, monotonic token enforced at DB

Even with a perfect locking service, you can run into a sneaky problem.

Server A gets a lock with a 10-second expiration.
Server A experiences a huge garbage collection pause (it freezes for 15 seconds).
The lock expires. Server B gets the lock and writes to the database.
Server A wakes up, thinks it still has the lock, and overwrites Server B's data!

Quiz Time

Server A holds a distributed lock and starts a long garbage collection pause. While paused, the 10-second TTL expires and Server B acquires the lock. Server A resumes. Without fencing tokens, what happens?

The Fix: The locking service must hand out a Fencing Token (an incrementing number) with the lock. Server A gets token #1, Server B gets #2. When Server A wakes up and tries to write to the database using token #1, the database rejects it, saying, "I've already seen token #2, you are out of date!"

Real-World Examples

Payment Processing (Stripe/PayPal): When a user double-clicks the "Pay" button by accident, distributed locks ensure the backend only processes the transaction once by locking on the user_id or order_id.
Cron Jobs: Ensuring a nightly database backup script only runs on one server, even if the script is deployed to 50 identical microservices.
Chubby: Google's internal lock service that manages coordination across thousands of servers, famously described as "a lock service for loosely-coupled distributed systems."

Summary

Distributed Locks prevent multiple servers from concurrently modifying the same shared resource.
Simple databases aren't enough. Handle crashes using Timeouts (TTL) or Ephemeral connections.
Redis (Redlock) is popular for high-performance locking, while ZooKeeper/etcd are preferred for high-reliability locking.
Always use Fencing Tokens on the resource side to prevent delayed nodes from corrupting data.

Gossip Protocol

How helpful was this content?

Comments

0/2000

Saved on this device only