Need for replication

Redundancy means duplication of critical data or services with the intention of increased reliability of the system. This same principle applies to services.

Creating redundancy in a system can remove single points of failure and provide backups if needed in a crisis.

Another important part: to create a shared-nothing architecture, where each node can operate independently of one another

Types/Schemas

  1. Single leader
  2. Multi-leader
  3. Leaderless

a) Sync - slower, bc it blocks all replicas until data is updated, but ensures consistency. b) Async

If you want/need a new replica

  1. Add new node
  2. Find snapshot of the data
  3. Copy on the fresh replica
  4. Give updates

Node outages

Follower failure: Catch-up recovery

On its local disk, each follower keeps a log of the data changes it has received from the leader. If a follower crashes and is restarted, or if the network between the leader and the follower is temporarily interrupted, the follower can recover quite easily: from its log.

Leader failure: Failover

One of the followers needs to be promoted to be the new leader, clients need to be reconfigured to send their writes to the new leader, and the other followers need to start consuming data changes from the new leader

Can happen

  • manually (an administrator is notified that the leader has failed and takes the necessary steps to make a new leader)
  • automatically

Replication lag - consistency models

Read-after-write consistency

Users should always see data that they submitted themselves.

Monotonic reads

After users have seen the data at one point in time, they shouldn’t later see the data from some earlier point in time.

Consistent prefix reads

Users should see the data in a state that makes causal sense: for example, seeing a question and its reply in the correct order.

Source