Distributed Locks
Updated June 3, 2026The Problem with Local Locks
If you write a multi-threaded application on a single machine, you know about "locks" (or mutexes). If two threads want to update the same variable, Thread A grabs the lock, updates the variable, and releases the lock. Thread B waits its turn.
But what happens when you scale to a distributed system with 10 different servers? A lock in the memory of Server A does nothing to stop Server B from modifying the same data in your database.
Analogy: Think of a shared office space with a single bathroom. If you just lock the door to your personal cubicle, anyone can still walk into the bathroom. To prevent an embarrassing situation, you need a single, physical key to the bathroom that hangs at the front desk.
Distributed Locks are that single key at the front desk. They ensure that across your entire cluster of servers, only one server can access a shared resource or perform a specific action at a given time.
The Core Concept
A distributed lock needs a centralized place that all servers can check. When a server wants to do something critical (like process a payment, or generate a daily report), it asks this central authority: "Can I have the lock?"
- If yes, it does the work, then tells the authority, "I'm done, release the lock."
- If no, it waits or aborts the operation.
The Anatomy of a Distributed Lock
A naive way to build this is to use a database table: INSERT INTO locks (resource_name) VALUES ('daily_report').
But what if the server crashes right after inserting the row? The lock is held forever, and the report never gets generated again.
A distributed lock needs a few key features:
- Mutual Exclusion: Only one client can hold the lock at a time.
- Deadlock Free: Even if the client holding the lock crashes, the lock must eventually be released (usually via a timeout/TTL).
- Fault Tolerance: The locking service itself shouldn't be a single point of failure.
How to Implement Distributed Locks
Distributed lock — Redis SETNX, TTL, Server B blocked while A holds lock
1. Redis (The Fast Way)
Redis is commonly used for distributed locks because it's incredibly fast. You can use the SETNX command (Set if Not eXists) along with an expiration time.
- The danger: If you have a single Redis node and it crashes, you lose your locks. To solve this, Redis created the Redlock algorithm, which requires grabbing the lock across a majority of independent Redis nodes.
Why does ZooKeeper use "ephemeral nodes" for distributed locks instead of regular database rows?
2. ZooKeeper / etcd (The Safe Way)
Systems like ZooKeeper and etcd are strongly consistent distributed key-value stores. They are built for exact coordination tasks like this.
- They use "ephemeral nodes" (files that exist only as long as the client is connected). If a server grabs a lock and then crashes, its network connection drops, and ZooKeeper instantly deletes the lock, freeing it up for others. This is much safer than relying purely on time-based expirations.
The "Fencing Token" Problem
Fencing token — stale lock holder rejected, monotonic token enforced at DB
Even with a perfect locking service, you can run into a sneaky problem.
- Server A gets a lock with a 10-second expiration.
- Server A experiences a huge garbage collection pause (it freezes for 15 seconds).
- The lock expires. Server B gets the lock and writes to the database.
- Server A wakes up, thinks it still has the lock, and overwrites Server B's data!
Server A holds a distributed lock and starts a long garbage collection pause. While paused, the 10-second TTL expires and Server B acquires the lock. Server A resumes. Without fencing tokens, what happens?
The Fix: The locking service must hand out a Fencing Token (an incrementing number) with the lock. Server A gets token #1, Server B gets #2. When Server A wakes up and tries to write to the database using token #1, the database rejects it, saying, "I've already seen token #2, you are out of date!"
Real-World Examples
- Payment Processing (Stripe/PayPal): When a user double-clicks the "Pay" button by accident, distributed locks ensure the backend only processes the transaction once by locking on the
user_idororder_id. - Cron Jobs: Ensuring a nightly database backup script only runs on one server, even if the script is deployed to 50 identical microservices.
- Chubby: Google's internal lock service that manages coordination across thousands of servers, famously described as "a lock service for loosely-coupled distributed systems."
Summary
- Distributed Locks prevent multiple servers from concurrently modifying the same shared resource.
- Simple databases aren't enough. Handle crashes using Timeouts (TTL) or Ephemeral connections.
- Redis (Redlock) is popular for high-performance locking, while ZooKeeper/etcd are preferred for high-reliability locking.
- Always use Fencing Tokens on the resource side to prevent delayed nodes from corrupting data.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices