Need for replication
Redundancy means duplication of critical data or services with the intention of increased reliability of the system. This same principle applies to services.
Creating redundancy in a system can remove single points of failure and provide backups if needed in a crisis.
Another important part: to create a shared-nothing architecture, where each node can operate independently of one another
Types/Schemas
- Single leader
- Multi-leader
- Leaderless
a) Sync - slower, bc it blocks all replicas until data is updated, but ensures consistency. b) Async
If you want/need a new replica
- Add new node
- Find snapshot of the data
- Copy on the fresh replica
- Give updates
Node outages
Follower failure: Catch-up recovery
On its local disk, each follower keeps a log of the data changes it has received from the leader. If a follower crashes and is restarted, or if the network between the leader and the follower is temporarily interrupted, the follower can recover quite easily: from its log.
Leader failure: Failover
One of the followers needs to be promoted to be the new leader, clients need to be reconfigured to send their writes to the new leader, and the other followers need to start consuming data changes from the new leader
Can happen
- manually (an administrator is notified that the leader has failed and takes the necessary steps to make a new leader)
- automatically
Replication lag - consistency models
Read-after-write consistency
Users should always see data that they submitted themselves.
Monotonic reads
After users have seen the data at one point in time, they shouldn’t later see the data from some earlier point in time.
Consistent prefix reads
Users should see the data in a state that makes causal sense: for example, seeing a question and its reply in the correct order.