System Architecture Interview

Initial Design

We will start with a single orders table that has the following schema:

Column Type

Column	Type
`order_id`	`INTEGER`
`customer_id`	`INTEGER`
`product_id`	`INTEGER`
`order_date`	`DATE`

order_id

INTEGER

customer_id

INTEGER

product_id

INTEGER

order_date

DATE

This table is constantly updated with new orders (order_id is always increasing).

Question:

What approach would you take to read rows from this table and write the data to Redis?

Tip	Redis hashes are a good match for relational databases. You can insert a hash into Redis with the `HSET` command: `HSET key field value [field value …]`

Expected Answers:

Read rows in batches (pages). Use the last fetched order_id as a parameter to the next page.
SQL query: SELECT * FROM orders WHERE order_id BETWEEN ? AND ?
Separation of concerns with reader, processor, and writer.
Read, process, and write rows in batches

Scalability

The orders table is updated at a rate of 5,000 operations/second and our tool is not able to keep up with the rate of change.

Question:

How can we modify/enhance our previous design so that we can scale?

Expected Answers:

Multithread the read/process/write components
Two solutions:
- Partition the reading, making sure multiple threads are not duplicating or missing rows
- Or have a single reader and an internal queue that multiple processors read from

Fault-Tolerance

The multithreaded/multi-process design we previously put together does not take care of failures with the source database or the target Redis database.

Question:

What changes can we make to our design so that we don’t lose data in case the source database or Redis is down?

Expected Answers:

Use a checkpointing mechanism that persists the last order_id successfully processed.

Eventing

It turns out that the order_id column is not monotonously increasing but rather is a unique ID that is generated by an Order Management System. Also orders can be updated or cancelled and these need to be reflected in the target Redis database.

Question:

How can we listen to changes happening on the source database?
What could these change events look like?

Expected Answers:

Introduce a Change Data Capture component for the source database
Design a change-event listener and queue from which the multiple threads can dequeue
Change event will have row data and the type of operation that caused the change (INSERT, UPDATE, DELETE)

System Architecture Interview

Scenario

Initial Design

Scalability

Fault-Tolerance

Eventing