We’re building a migration tool that reads from a relational database and writes to Redis.
The goal of this exercise is to design such as system and identify solutions to key concerns like Highly Availability (HA) and Scalability.
We will start with a single orders table that has the following schema:
| Column | Type |
|---|---|
|
|
|
|
|
|
|
|
This table is constantly updated with new orders (order_id is always increasing).
Question:
What approach would you take to read rows from this table and write the data to Redis?
Tip | Redis hashes are a good match for relational databases. You can insert a hash into Redis with the
|
Expected Answers:
Read rows in batches (pages). Use the last fetched order_id as a parameter to the next page.
SQL query: SELECT * FROM orders WHERE order_id BETWEEN ? AND ?
Separation of concerns with reader, processor, and writer.
Read, process, and write rows in batches
The orders table is updated at a rate of 5,000 operations/second and our tool is not able to keep up with the rate of change.
Question:
How can we modify/enhance our previous design so that we can scale?
Expected Answers:
Multithread the read/process/write components
Two solutions:
Partition the reading, making sure multiple threads are not duplicating or missing rows
Or have a single reader and an internal queue that multiple processors read from
The multithreaded/multi-process design we previously put together does not take care of failures with the source database or the target Redis database.
Question:
What changes can we make to our design so that we don’t lose data in case the source database or Redis is down?
Expected Answers:
Use a checkpointing mechanism that persists the last order_id successfully processed.
It turns out that the order_id column is not monotonously increasing but rather is a unique ID that is generated by an Order Management System.
Also orders can be updated or cancelled and these need to be reflected in the target Redis database.
Question:
How can we listen to changes happening on the source database?
What could these change events look like?
Expected Answers:
Introduce a Change Data Capture component for the source database
Design a change-event listener and queue from which the multiple threads can dequeue
Change event will have row data and the type of operation that caused the change (INSERT, UPDATE, DELETE)