Leader Election

Updated June 3, 2026
M
Magic Magnets Team
7 min read

Why Do We Need a Leader?

Imagine a factory where 10 robots are tasked with packing boxes. If they all act independently without coordination, they might all try to pack the same box, or worse, pack half a box and leave it sitting there. To avoid chaos, they need a manager. One robot takes charge: "I will assign the boxes. Robot A, take box 1. Robot B, take box 2."

In distributed systems, you often have a cluster of identical servers (nodes). While having multiple servers is great for redundancy, it can be terrible for coordination. If a scheduled cron job needs to run once a day, you don't want all 10 servers to run it simultaneously and charge a customer's credit card 10 times!

Leader Election is the process of designating exactly one node in a cluster as the "coordinator" or "master" to handle tasks that cannot be safely executed concurrently.

The Core Concept

When a system boots up, all nodes are equal. They communicate with each other and use an algorithm to agree on which node should be the Leader. The other nodes become Followers (or workers).

If the Leader crashes, the network fails, or the Leader becomes too slow, the Followers notice and immediately trigger a new election to pick a replacement.

How is a Leader Elected?

There are several ways to elect a leader, but most rely on a central consensus store or a distributed algorithm.

1. Using a Consensus System (The Easy Way)

Most modern architectures don't write their own leader election logic from scratch. Instead, they use systems like ZooKeeper or etcd.

  • Analogy: Using a speaking token. Whoever holds the token is the leader.
  • How it works: All nodes try to create a temporary "lock" file in ZooKeeper simultaneously. ZooKeeper guarantees that only one will succeed. The winner becomes the Leader. The losers watch that file. If the Leader crashes, its connection drops, ZooKeeper deletes the temporary file, and the remaining nodes rush to create it again.

2. Using Algorithms like Raft or Paxos (The Hard Way)

If the system itself is a database (like Cassandra or MongoDB), it uses consensus algorithms (like Raft, Paxos, or Zab) to elect a leader internally without relying on external tools. Nodes vote for each other, and whoever gets a majority of votes wins.

Handling the "Split Brain" Problem

Quiz Time

In a 9-node cluster using quorum-based leader election, the network splits into two groups of 5 and 4. What happens?

The biggest danger in leader election is the Split Brain scenario. Imagine a network partition cuts your 5-node cluster in half. Two nodes are on one side, three on the other. They can't talk to each other. The side with two nodes thinks the Leader is dead, so they elect a new one. Now you have two leaders writing conflicting data!

How do we fix this?

  • Quorums: A node can only become a leader if it receives votes from a majority (N/2 + 1) of the total nodes. In a 5-node cluster, you need 3 votes. In a network split of 3 and 2, only the side with 3 nodes can elect a leader. The side with 2 nodes simply stops working until the network is fixed. [quiz:1]

  • Fencing Tokens: When a leader is elected, it gets an incrementing ID (e.g., term #5). If it gets disconnected, and a new leader is elected (term #6), the old leader might wake up and try to write data. The database will see the old #5 token and reject the write, saying "You've been replaced!"

Real-World Examples

  • Kafka: Uses ZooKeeper (and now Kraft, its own internal Raft implementation) to elect a "controller" broker that manages the state of partitions and replicas.
  • Redis Sentinel: Monitors Redis master and slave instances. If the master fails, Sentinel automatically elects a slave to be promoted to the new master.
  • Kubernetes (K8s): The Kubernetes control plane uses a lease-locking mechanism in etcd. Multiple controller managers can run, but only the one holding the lease actively manages the cluster.

Summary

  • Leader Election designates a single node to coordinate tasks, preventing conflicts like duplicate cron jobs or conflicting data writes.
  • Most applications implement this using external consensus stores like ZooKeeper or etcd, using locking mechanisms.
  • The system must avoid Split Brain, usually solved by requiring a strict majority quorum for elections and using fencing tokens to block deposed leaders from causing havoc.

Distributed Locks

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices