A Guide to Consistent Hashing
Consistent Hashing
# Basics
Typically used for key sharding for databases. A typical db using consistent hashing is Redis.
# Hashing
A hash function is a function that maps one piece of data—typically describing some kind of object, often of arbitrary size—to another piece of data, typically an integer, known as hash code, or simply hash.
# Hash Table
# Distributed Hashing
Rehashing problem, reminds of the rehashing process of redis(when the load factor of hash table is greater or less than threshold)
# Consistent Hashing
We need a distribution scheme that does not depend directly on the number of servers, so that, when adding or removing servers, the number of keys that need to be relocated is minimized. One such scheme—a clever, yet surprisingly simple one—is called consistent hashing, and was first described by Karger et al. at MIT in an academic paper from 1997 (according to Wikipedia).
Imagine we mapped the hash output range onto the edge of a circle. That means that the minimum possible hash value, zero, would correspond to an angle of zero, the maximum possible value (some big integer we’ll call INT_MAX) would correspond to an angle of 2𝝅 radians (or 360 degrees), and all other hash values would linearly fit somewhere in between. So, we could take a key, compute its hash, and find out where it lies on the circle’s edge.
For New Keys:
Each object key will belong in the server whose key is closest, in a counterclockwise direction (or clockwise, depending on the conventions used). In other words, to find out which server to ask for a given key, we need to locate the key on the circle and move in the ascending angle direction until we find a server.
# Slot Range Hashing
multiple slots to server
# Virtual Node
Add more virtual node along the ring, and each server has a relatively equal possibility.
# Requirements
# Practical
# Redis Cluster