4-Phase Migrations (Dual Writing)

Case: moving objects from one db table to another (after some decided schema changes, like Stripe subscriptions becoming a separate table from Customers).

  1. Dual writing to the existing and new tables to keep them in sync

To backfill the new tble on the live database, Stripe used MapReduce with the snapshots of the database available in their Hadoop cluster.

Scalding - library in Scala to write MapReduce jobs. They used this to identify all subscriptions.

  1. Changing all read paths in the codebase to read from the new table.

They used Scientist by GitHub to identify whether it was safe to read from the new read path table:

i. Use Scientist to read from both the Subscriptions table and the Customers table.
ii. If the results don’t match, raise an error alerting our engineers to the inconsistency.

After everything matched up, they started reading from the new table.

  1. Changing all write paths in the codebase to write to the new table.

We can’t just substitute new records with old records: every piece of logic needs to be considered carefully. If we miss any cases, we might end up with data inconsistency. Thankfully, we can run more Scientist experiments to alert us to any potential inconsistencies along the way.

New writes Subscriptions Customers (basically reversed to how it was before)

  1. Removing old data that relies on the outdated data model.

We first automatically empty the array every time a subscription is loaded, and then run a final Scalding job and migration to find any remaining objects for deletion.

Source