Rollbacks & Immutable Infrastructure
Updated June 8, 2026Imagine you buy a laptop. Over the next two years, you install software, tweak settings, download sketchy files, and upgrade the OS. Eventually, it starts acting weird. It's sluggish, programs crash, and you have no idea which of the 10,000 changes you made caused the problem. Your laptop has become a unique, fragile snowflake.
To fix it, you could spend hours trying to undo your changes. Or, you could just throw it in the trash, buy an identical new laptop, and start fresh.
In system design, spending hours trying to undo changes is called "mutable infrastructure." Throwing it in the trash and starting fresh is called Immutable Infrastructure.
The Old Way: Mutable Infrastructure
For decades, servers were treated like pets. We gave them names (like web-server-gandalf), we nurtured them, and when they got sick, we logged in via SSH and gave them medicine (running updates, tweaking configs).
When we wanted to deploy new code, a script would log into the server, delete the old files, and download the new files.
This created massive problems:
- Configuration Drift: Over time,
server-1andserver-2would subtly drift apart. A developer might manually tweak a setting onserver-1to fix a bug and forget to tell anyone. - Failed Rollbacks: If a deployment failed halfway through, the server was left in an unknown, corrupted state. Rolling back meant writing a complex "undo" script and hoping it worked.
The New Way: Immutable Infrastructure
Immutable Infrastructure treats servers like cattle, not pets.
Once a server (or a container, or a VM) is created, it is never, ever modified. No one SSHes in. No configuration changes are made. No software is updated.
If you need to update the application to Version 2, you don't log into the Version 1 server to change the code. You build a brand new image of Version 2, spin up fresh servers from that image, and then completely destroy the old Version 1 servers.
V1 containers run without modification. V2 image is ready in the registry. | V2 containers replace V1. The V1 image stays in the registry for instant rollback.
Why is this better?
- Absolute Consistency: Every single server running your code is an exact, byte-for-byte clone of the master image. You completely eliminate the "it works on my machine" or "it works on server A but not server B" problems.
- Predictable Deployments: You test the exact image in staging that will run in production. You aren't testing code; you are testing the entire environment.
- Painless Rollbacks: This is the killer feature.
The Power of Instant Rollbacks
Let's talk about rollbacks in an immutable world.
You deploy V2, and production crashes. In the old mutable world, you would scramble to write a script to uninstall V2 and reinstall V1, hoping you didn't leave any broken dependencies behind.
In an immutable world, V1 isn't something you have to reconstruct. V1 is a fixed, immutable image stored in your container registry (like Docker Hub or AWS ECR).
To rollback, you simply tell your orchestrator (like Kubernetes): "Hey, spin up the V1 image again and kill the V2 containers."
You know with 100% certainty that V1 will work exactly as it did 10 minutes ago, because the container image itself hasn't changed a single byte. The rollback is deterministic, fast, and stress-free.
Real-World Examples
Docker and Kubernetes
The entire modern container ecosystem is built on the philosophy of immutability. A Docker image is a read-only template. When Kubernetes scales up your app, it stamps out identical pods based on that image. If a pod starts acting weird, Kubernetes doesn't try to fix it; it ruthlessly shoots the pod in the head and spins up a fresh one.
Netflix's "Spinnaker"
Netflix builds "Amazon Machine Images" (AMIs) for their deployments. An AMI contains the OS, the runtime, and the application code. Every deployment spins up thousands of brand-new EC2 instances from a fresh AMI. Once traffic is routed to the new instances, the old ones are terminated.
The Trade-Offs
Immutable infrastructure is incredible, but it requires a mature CI/CD pipeline.
- Slower Build Times: Because you have to bake an entirely new OS image or Docker container for every single code change, your build pipeline might take longer than just pushing a zipped folder of code to a server.
- Statelessness is Required: Your servers must be completely stateless. If a server saves a user's uploaded photo to its local hard drive, that photo will be vaporized when the server is destroyed during the next deployment. All state must be pushed to external databases or object storage (like S3).
[!TIP] No SSH allowed! If you want to enforce immutable infrastructure, physically disable SSH access to your production servers. If engineers can't log in to manually fix things, they are forced to automate it in the build pipeline.
Summary
Moving from mutable to immutable infrastructure is like moving from carving statues out of clay to casting them in bronze. Clay can be easily altered, but it's fragile and loses its shape. Bronze requires a mold and takes longer to create, but once it's set, it's permanent and reproducible. By ensuring your servers are never modified after they are created, you gain absolute confidence in your deployments and the ability to execute fast, deterministic rollbacks when things go wrong.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices