Partitioning

Intro

Partitioning: each piece of data belongs exactly to one partition to support scalability.

Partition can be viewed as a small database, though the database can support operations that affect multiple partitions. The dataset is distributed across many disks and the query load is distributed across many processors.

After a certain scale point, it is cheaper and more feasible to scale horizontally by adding more machines than to grow it vertically by adding beefier servers.

Query operating on a single partition is executed independently on this partition, so throughput is scaled by adding more nodes and large complex queries can be parallelized.

Types

replication with partitioning (RAID e.g.)
Partitioning by key range

locality for range queries

Data needs to be sorted first
Hotspots

Partitioning by hash of key
- Can evaluate Hotspots
- Range queries are hard Partitioning
Secondary indexes

Rebalancing

The data distribution is not uniform, e.g., there are a lot of places for a particular ZIP code, that cannot fit into one database partition.
There are a lot of load on a shard, e.g., there are too many requests being handled by the DB shard dedicated to user photos.

Either we have to create more DB shards or have to rebalance existing shards. This means the partitioning scheme changed and all existing data moved to new locations. Doing this without incurring downtime is extremely difficult. Using a scheme like directory based partitioning does make rebalancing a more palatable experience at the cost of increasing the complexity of the system and creating a new single point of failure (i.e. the lookup service/database).

Triggers

Query numbers
Data volume
Machine fails

Types of rebalancing

static load balancing, which uses fixed rules for distribution,
dynamic load balancing, which adjusts based on real-time server conditions and traffic.

DV

Recent Notes

Linked Lists

Abstraction

linked-lists-problems

Explorer

Intro

Types

Rebalancing

Triggers

Types of rebalancing

Graph View

Table of Contents

Backlinks