A/B Testing Infrastructure

Updated June 8, 2026
M
Magic Magnets Team
8 min read

Imagine you run a massive coffee shop. You want to see if changing the color of your menu board from black to bright red makes people buy more expensive lattes.

You could change the board to red for a week and see what happens. But what if it rained all week, and fewer people came in anyway? Your data would be skewed.

The scientifically accurate way to test this is to magically show half the people who walk in the door a black menu, and the other half a red menu, at the exact same time. Then, you compare the sales from the "Black Menu Group" to the "Red Menu Group."

This is exactly what A/B Testing (or split testing) does in software.

What is A/B Testing?

A/B testing is a methodology used to compare two (or more) versions of a webpage, app, or feature to determine which one performs better based on specific metrics (like click-through rate, sign-ups, or revenue).

  • Group A (Control): Gets the current, existing experience.
  • Group B (Variant): Gets the new, modified experience.

While Canary Releases (which we covered earlier) are about ensuring a new version doesn't break the system technically, A/B Testing is about finding out if the new version is actually better for the business.

algobase.dev
The assignment engine hashes the user ID to deterministically place them in Group A or B. Both groups fire events to Kafka on every user action. The data warehouse accumulates events for statistical analysis. No latency added to the page request — the hash is computed in memory.
1 / 1

The assignment engine hashes each user ID to place them deterministically in Group A or B. Both groups fire events to Kafka. The data warehouse accumulates them for statistical analysis.

The Infrastructure of A/B Testing

Building the infrastructure to support large-scale A/B testing is a classic system design challenge. It's not just a simple if/else statement. You need a complete pipeline to assign users, serve the right experience, and crunch massive amounts of data.

Here are the core components:

1. The Assignment Engine

When a user hits your system, you need to decide: Are they in Group A or Group B? This assignment must be:

  • Deterministic: If Alice is in Group B today, she must be in Group B tomorrow. If her button changes color every time she refreshes the page, she'll think your site is broken.
  • Uniform: It should split traffic evenly (or exactly at the percentage you specify, like 90/10).
  • Fast: This decision happens on the critical path of rendering the page. It cannot add 200ms of latency.

A common way to solve this without a database dip is using Hashing. You take the User's ID (e.g., user_123), combine it with the Experiment ID (e.g., red_button_test), and run it through a fast hashing algorithm like MurmurHash. The hash outputs a number between 0 and 99. If the number is < 50, they get Group A. If >= 50, they get Group B. This requires zero network calls and is perfectly deterministic!

2. The Delivery Mechanism

Once assigned, the application needs to render the correct experience. This is often handled through Feature Flags. The A/B testing engine tells the feature flag system which variant the user should see.

3. The Analytics and Telemetry Pipeline

This is where the heavy lifting happens. Every time the user takes an action (clicks a button, buys an item, abandons a cart), the client fires off an event containing the User ID, the action, and the Experiment ID.

These events are usually dumped into a high-throughput queue (like Apache Kafka) and then ingested into a data warehouse (like Snowflake or BigQuery).

4. The Statistics Engine

Finally, data scientists need to analyze the results. The system calculates the conversion rates for Group A and Group B and runs statistical tests (like a T-test) to determine if the difference is statistically significant, or just random noise.

Real-World Examples

Booking.com's Testing Culture

Booking.com is famous for running arguably the most aggressive A/B testing infrastructure in the world. At any given moment, there are over 1,000 active experiments running on their site. Everything from the exact shade of blue on a button to the phrasing of "Only 1 room left!" is constantly being tested and optimized for maximum conversion.

Netflix's Artwork Optimization

You and your friend might both have "Stranger Things" recommended to you, but you might see completely different thumbnail images. Netflix A/B tests artwork to see which images yield the highest click-through rate. If you watch a lot of romance movies, the algorithm might A/B test a thumbnail focusing on a romantic subplot versus a sci-fi thumbnail.

The Challenges

Designing an A/B testing system comes with strict constraints:

  • The "Flicker" Effect: If you run A/B tests purely on the client-side using JavaScript, the user might see the old version of the page for a split second before JS swaps it to the new version. This "flicker" invalidates test results because it confuses the user. Server-side rendering of variants is generally preferred.
  • Interacting Experiments: If you are testing a new checkout button (Test 1) AND testing a new shopping cart layout (Test 2) at the same time, the results might interfere with each other. Complex systems need ways to create mutually exclusive experiment layers.

[!TIP] Don't peek! In statistics, there is a concept called "peeking." If you check your A/B test results every day and stop the test the moment Group B looks like it's winning, you will often end up with false positives. You must decide the sample size before you start the test and wait for it to finish.

Summary

A/B Testing Infrastructure removes the guesswork from product development. Instead of arguing in a meeting room over which feature will increase revenue, you test both and let the data decide. Building a system that can deterministically assign millions of users without adding latency, track billions of events, and provide statistically sound results is a formidable system design challenge. It is also the engine that drives growth at every major tech company.

Schema Migrations

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices