From Batch to Reactive

A Journey in Data Replication

 
 

Julien Ruaux

The Pain 😫

  • Replicating millions of keys between Redis instances

  • How big should my batches be? 🤷

  • How many threads do I need? 🤷

  • What about queue sizes? 🤷

Imagine

  • A house on fire

  • Putting out the fire = workload (requests)

  • How do we move water from source to fire?

The Batch Way

  • One thread = one person with buckets

  • Fill bucket → Walk to fire → Pour → Walk back

  • Repeat

Problem: Lots of waiting and idle time

Can We Carry Bigger Buckets?

  • Bigger bucket = larger batch size

  • More data per trip

  • But…​ heavier (more memory)

  • Slower to fill and carry

Same problem: Still waiting between trips

Can We Add More People?

  • More threads = more firefighters

  • Need someone to coordinate them

  • Synchronization overhead

  • Can still run out of threads

  • How many is optimal? 🤷

Complexity: Coordination + tuning is hard

The Realization 💡

  • The problem isn’t the bucket size

  • The problem isn’t the number of people

  • The problem is the approach itself

What if we didn’t carry buckets at all?

The Reactive Way

  • Water pump: stream, don’t batch

  • Publisher → Pipe → Subscriber

  • Water flows continuously as needed

Backpressure: Subscriber controls the flow

  • "Give me 10 items right now"

  • No overwhelming the target

  • Natural flow control

Where Did This Come From?

  • 2009-2012: Reactive principles emerge

    • Rx.Net

    • Reactive Streams

  • 2015: Language support

  • Today: RxJS, Reactor, Rx.NET, RxPy, RxRust

The Catch: Need Reactive All The Way

You can’t have a reactive pipe if one end is blocking
  • Reactive drivers (non-blocking I/O)

  • Reactive servers (if building web apps)

Solving the Pain

  • ✅ No more tuning guesswork: Fewer knobs to turn

  • ✅ Automatic flow control (Backpressure)

  • ✅ Better resource usage (reduce idle time)

  • ✅ Simpler codebase: ~4000 LOC → ~500 LOC

Resources