# Requirements
# Functional Features
🔹 Globally unique - If IDs are not globally unique, there could be collisions.
🔹 Roughly sorted by time - So user IDs, post IDs can be sorted by time without fetching additional info.
🔹 Numerical values only - Naturally sortable by time.
🔹 64 bits - 2^32 = ~4 billion -> not enough IDs | 2^64 is big enough | 2^128 waste space
🔹 Highly scalable, low latency | Ability to generate a lot of IDs per second in low latency fashion is critical.
- Capable of generating ten million sequential UIDs per second
- UIDs must preserve some level of ordering information
- capable of adjusting generation rate based on consumption
# Non-Functional Features
- Availability
- Scalability
- Low Latency
# Functional Features
- Generate a globally unique id
- Chronological orderable
# Non-Functional Features
# Solutions
# UUID
pros: Independent
cons: Too Long, Not Int
字符串作为主键id,插入数据时会是在聚集索引中是随机插入,容易造成页分离。而且字符串的比较比数字类型的开销更大,字符串作为主键id查询效率会低于数字类型的主键。
The string is used as the primary key id. When inserting data, it will be inserted randomly in the clustered index, which is easy to cause page separation. Moreover, the comparison of strings is more expensive than that of numeric types, and the query efficiency of strings as the primary key id will be lower than that of the primary key of numeric types.
通常可以作为一些临时性唯一标识,例如用户登陆后,生成一个UUID作为登录的会话ID,作为key存储在Redis中,Value是用户相关的信息。
# Primary Key Auto Increment
pros: Easy, Monotonically Increasing
cons:
- Rely on autoincrement mechanism, cannot support large amount of id concurrency.
- Rely on database. Single Point Failover. Master-Slave Switching may introduce duplicated key, if master database haven’t received the latest update of id from master.
- Id is continuously, which is not recommended for business usage.
适用于并发量不高的业务。
# Multi Database Primary Key Auto Increment
Set the step size for different databases.
1 | |
pros: more efficient than single database
cons: hard to scale
# 数据库分段发号生成唯一id。(例如美团的Leaf框架中的segement模式)
使用数据库当发号器
biz_tag字段用于区分每种id应用的业务, max_id字段记录了当前已生成的最大的id, step字段代表每次可以获取id的数量
id生成项目每次使用下面这条语句从数据库获取step数量的id,并且更新max_id的值,将step数量的id存储在内存中,供业务方通过HTTP,RPC,Client等方式来获取。
1 | |
pros: depends on step, rather than the number of databases.
cons:
- Rely on database
- id is still continuous.
# Snowflake
snowflake是推特开源分布式ID生成算法,一共有64位,
第一位是0,是标志位,因为在二进制数中,第一位是0,代表是正数。
接下来41位是13位的毫秒时间戳,最大可以到2039年9月。
再接下来10个二进制位是workID,也就是服务器的id。
最后面12位是业务序列号
意味着每毫秒最大可以生成2的12次方个id,4096个,支持每个机器每毫秒生成4096个id,每秒可以生成409.6万的id
pros:
- efficient, 2^12=4096 ids in 1ms.
- relatively independent, only rely on local system time
- flexible, allocate bits according to business
cons:
- data skew.
- relay on clock. May generate duplicated ids if clock rewind clock back.
mongodb objectID snowflake-like id
通过“时间+机器码+pid+inc”共12个字节,通过4+3+2+3的方式最终标识成一个24长度的十六进制字符。
# Domain Model
# Single Worker
# Cluster
Generate Unique Ids per thread
Registration Service for per snowflake worker
# Practical
https://github.com/bwmarrin/snowflake
# Snowflake
Snowflake
Ace the System Design Interview — Distributed ID Generator