Introduction to ID generation method
1. Introduction
Software ID is an important concept in computer systems, used to identify different data entities and interactions. An ID generation scheme refers to the algorithm or method used to generate unique identifiers (IDs). These identifiers are typically used to uniquely identify records in a database, messages in a message queue, nodes in a distributed system, and so on.
A good ID generation scheme should be able to generate unique IDs and should be efficient, without impacting performance when dealing with large amounts of data.
An ideal unique ID has the following characteristics:
- Uniqueness: The generated ID is globally unique, with minimal probability of conflicts within a specific range.
- Sequentiality: The generated ID follows a certain rule of order, making it easy to insert and sort.
- Availability: It ensures availability under high concurrency.
- Autonomy: It can generate IDs independently without relying on central authentication in a distributed environment.
- Security: It does not expose system and business information.
2. Common ID Generation Schemes
2.1. Database Auto-Increment ID
Database auto-increment ID is a common scheme for generating unique identifiers. It relies on the mechanism of the database being used and does not need further elaboration.
2.2. UUID
UUID, commonly used version 4, has sparked many discussions on whether it is better than database auto-increment ID.
There are various arguments, but personally, I believe that despite its drawbacks, UUID is a better choice compared to database auto-increment ID.
2.3. Twitter Snowflake Algorithm
Snowflake is a distributed ID generation algorithm developed by Twitter. It ensures the uniqueness and sufficient ordering of IDs. The ID format of Snowflake is a 64-bit integer, with the first bit as the sign bit, the next 41 bits as the timestamp, 10 bits as the node ID, and 12 bits as the sequence number.
It has the following characteristics:
- Global uniqueness: Generating Snowflake IDs on different machines makes the occurrence of duplicates almost impossible.
- Timestamp ordering: Snowflake IDs contain timestamp information, allowing the inference of the order in which they were generated based on their size.
- Readability: Snowflake IDs adopt a fixed 64-bit binary format, making them relatively easy for humans to understand.
- Distributed generation: Snowflake IDs support distributed systems, where each node can generate unique identifiers, avoiding single point of failure.
- High performance: The generation process of Snowflake IDs is very fast as it operates purely in memory. Additionally, Snowflake IDs are 64 bits in length, shorter than many other unique identifiers, making them more compact.
3. Some newer solutions
Newer solutions generally have the following characteristics:
- Generated locally, not relying on distributed solutions
- Choosing string type
- Keeping the ID length as short as possible while ensuring uniqueness
- Time-ordered
3.1. UUID v678
IETF is drafting a new UUID format, which also aligns with the trend of its application.
- v6 is backward compatible with v1, with only the order of fields adjusted to be more friendly to database locality.
- v7 combines Unix timestamps and random numbers.
- v8 allows for direct field definition, with only the ver and var fields having requirements, while other positions are completely defined by the implementation.
Two trends can be observed:
- Performance: Both v6 and v7 consider sortability to address the most common database performance issues encountered when using UUIDs.
- Customizability: Application requirements vary greatly, and a single standard cannot solve all problems. As a result, various ID generation schemes have emerged, including v8.
3.2. ULID
ULID (Universally Unique Lexicographically Sortable Identifier) is a timestamp-based unique identifier with the following characteristics:
- Global uniqueness: It is highly unlikely to generate duplicate ULIDs on different machines.
- Time ordering: ULIDs contain timestamp information, allowing the inference of the generation order based on the ULID’s magnitude.
- Readability: ULIDs use a character set based on Crockford’s Base32 encoding, making them relatively easy for humans to read.
- Moderate length: ULIDs consist of 26 characters, which is shorter and more compact compared to many other unique identifiers.
High 48 bits for time + low 80 bits for random numbers, encoded as a 26-character string using base32.
3.3. Nano ID
Nano ID is a lightweight, high-performance ID generator that adopts a similar algorithm to Twitter’s Snowflake. Nano ID consists of 21 characters, with 15 characters representing the timestamp and 6 characters generating random numbers. This ensures the uniqueness and sufficient randomness of the ID. Nano ID is suitable for ID generation in high-concurrency environments, such as URL shortening services. However, Nano ID is not suitable for scenarios that require sorting or time-related operations.
3.4. KSUID
Originating from the CDP vendor Segment.
High 20 bits for time + low 128 bits for random numbers, encoded as a 27-character string using base62.
3.5. TSID
Time-ordered 64-bit integers, encoded as a 13-character string using base32.
3.6. Cuid2
Generated by multiple rounds of iteration, producing a Base36 encoded string with a specified length. It emphasizes security but has a slow calculation speed. Time is not ordered.
4. How to Choose the Right ID Generation Scheme
When it comes to choosing an appropriate ID generation scheme, there are several factors to consider. Here are some guidelines to help you make the right decision:
4.1. Understand Your Requirements
Before selecting an ID generation scheme, it is crucial to understand your specific requirements. Consider factors such as uniqueness, scalability, performance, and security.
4.2. Evaluate Different Approaches
There are various ID generation approaches available, including sequential IDs, UUIDs, and database-generated IDs. Research and evaluate each approach based on your requirements.
4.3. Consider Uniqueness
Ensuring the uniqueness of generated IDs is essential to avoid conflicts. Sequential IDs may be suitable for smaller systems, but for larger-scale applications, consider using UUIDs or database-generated IDs.
4.4. Evaluate Scalability
Consider the scalability of the ID generation scheme. Will it be able to handle a growing number of records without performance degradation? Ensure that the chosen scheme can accommodate future growth.
4.5. Assess Performance
Performance is another critical aspect to consider. Some ID generation schemes, such as sequential IDs, can be faster than others. Evaluate the performance implications of each approach and choose accordingly.
4.6. Think About Security
If security is a concern, choose an ID generation scheme that does not expose any sensitive information. UUIDs, for example, can provide a higher level of security compared to sequential IDs.
4.7. Test and Benchmark
Finally, before implementing any ID generation scheme, conduct thorough testing and benchmarking. This will help you identify any potential issues and ensure that the chosen scheme meets your performance expectations.
By following these guidelines, you can select an appropriate ID generation scheme that suits your specific needs.
Solution | Digital Length | String Length | Time orderly |
---|---|---|---|
Autoincrement ID | 64 | 1-20 | Yes |
UUID | 128 | 36 | Depends on version |
Snowflake | 64 | 1-19 | Yes |
ULID | 128 | 26 | Yes |
Nano ID | 64 | 21 | No |
KSUID | 160 | 27 | Yes |
TSID | 64 | 13 | Yes |
Cuid2 | variable length | variable length | No |
5. Reference
https://www.jitao.tech/posts/database-ids/
https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/
https://blog.twitter.com/engineering/en_us/a/2010/announcing-snowflake
https://github.com/ulid/spec
https://github.com/ai/nanoid
https://github.com/segmentio/ksuid
https://github.com/f4b6a3/tsid-creator
https://github.com/paralleldrive/cuid2