You don't need a UUID

Hi, my name is Andrey Bogomolov, I am an Android developer in the Wildberries app Performance team.

One day, while working on the code, I noticed the use of UUIDs in the UI and wondered about its impact on performance. Tests showed that the standard Java implementation of UUIDs can become a bottleneck on high DAU screens and in frequently used components.

In this article, we will look at different approaches to generating unique identifiers, compare their performance, and write our own optimized solution for mobile applications.

Content

Analysis of popular ID generation methods

UUIDs occupy a special place among ID creation methods due to their versatility, scalability, and wide application in industry. These characteristics make UUIDs an ideal starting point for analysis.

UUIDs come in several versions (1 to 8), each with its own characteristics, making them suitable for different scenarios. A brief structure is shown in the table below.

Structure of different versions of UUID

Structure of different versions of UUID

UUID version 1 is created based on time and MAC address, but may reveal information about the device. UUID version 2 similar to the first version, but adds domain information, which is useful in enterprise systems but limited in other areas. UUID version 6 Improves version 1 by providing more logical time sorting, but may still expose system data. UUID version 7 uses timestamps and random data, offering a balance between uniqueness, privacy and ordering, without depending on the MAC address.

UUID versions 3 and 5 use hashing to create identifiers. Version 3 uses MD5, which, while it removes hardware dependency, is vulnerable to collisions, making it less secure. Version 5, with SHA-1-based hashing, is more resistant to collisions and is suitable for security-sensitive tasks.

UUID version 4 It is generated randomly, which makes it popular due to its simplicity and high level of uniqueness without being tied to system data. UUID version 8 Allows you to store arbitrary data, providing flexibility in use.

UUID version 4 has become the most widely used type in Android development due to its standard implementation in Java (UUID.randomUUID()). Versions 4 and 7 seem to be the most preferable due to their simplicity, security, and lack of risk of disclosing system information. UUID has many alternatives created by large companies, but it remains the optimal choice in mobile development, as it is a standard and balanced for common tasks.

Performance comparison

To analyze the performance of ID generation methods, the library was used com.github.f4b6a3 in Java. The standard implementation of UUID version 4 in Java was compared (UUID.randomUUID()), as well as UUID versions 6 and 7. The goal was to evaluate the impact of using a secure random number generator on the generation speed. In addition, the test included GUIDs of similar versions with an unsecured random number generator and an incremental generation method using AtomicLong.

The main difference between protected (SecureRandom) and unprotected (Random) random number generators are based on their level of predictability and security. A secure generator uses cryptographically strong algorithms, ensuring unpredictability of numbers, while an unsecure generator is predictable and less secure. The use of a secure generator in UUID versions 4 and 7 complies with the specification RFC 9562which recommends a cryptographically strong generator (CSPRNG) for maximum security and uniqueness.

Particular attention was paid to the time it takes to convert IDs to strings, as this is the standard way to work with IDs in mobile development. While ID objects are more efficient in terms of performance and memory, string representations are easier to use. Typically, a UUID is immediately converted to a string upon creation to make it easier to work with.

The testing was conducted on a OnePlus 10 Pro device with Android 14. Microbenchmark, Kotlin version 2.0.20-RC was used for measurements. The test included 10 runs, each including many iterations (their number is determined by Microbenchmark itself). The code, test launch script, and measurements are available at the link at the end of the article.

ID generation time comparison

ID generation time comparison

The results showed that UUID versions 4 and 7 take longer to generate due to the use of a secure generator. UUID version 6 is faster because it does not rely on random number generation. GUIDs with an unsecure generator also showed higher speed.

However, even GUIDs were found to be 20 or more times slower when generating string IDs compared to incremental generation via AtomicLong. This may be acceptable in most scenarios, such as sending data to analytics or standard application functionality. However, in the context of screens with many active users, where every millisecond counts, slowdowns can degrade the user experience and impact the business. Likewise, even small speedups of hundreds of nanoseconds in common components can significantly improve application performance.

Developing your own ID generation method

Objectives and requirements

During the research process, it became necessary to create our own method for generating identifiers. The first step was to define the key requirements.

The first requirement is uniqueness at the device level, which is sufficient in most cases. The identifier will be used for local entities, such as pagination or UI components that are not sent to the server. Global uniqueness would complicate the task and would not provide a significant performance gain compared to UUID.

The second requirement is to maintain uniqueness when the application is restarted. The application can be restarted for various reasons: lack of memory, forced termination by the user or the system. In this case, the identifier can be saved in a local database or in SavedState for UI models and should not conflict with new IDs that will be generated after a restart.

The third requirement is to support generating IDs directly in string format, without converting from an object. In mobile development, IDs are often passed and stored as strings, so generating string IDs directly will speed up the process.

The fourth requirement is staticity. This will avoid conflicts between different instances of generators and simplify their use.

Approaches to Ensuring ID Uniqueness

In the process of studying various methods for obtaining unique identifiers, several approaches were considered.

Incremental counter guarantees uniqueness by incrementing the ID by one on each request. However, to maintain uniqueness after restarting the application, the counter state must be saved and restored, which may be unreliable due to possible failures.

Usage MAC addresses device adds uniqueness tied to a specific device. However, if the ID is used only on the device itself, this approach loses its meaning. Moreover, the MAC address can reveal confidential information and lead to collisions due to substitution or manufacturer errors.

Random Number Generation can also ensure uniqueness. Protected generation (SecureRandom) uses cryptographically strong algorithms such as CSPRNG, which guarantee security and minimize the risk of collisions due to high entropy. However, this takes more time. Unprotected generation (Random) works faster, but is less reliable, since it uses algorithms with less entropy, which increases the risk of predictability and collisions.

Usage device time as a source of unique values ​​yields monotonically increasing values, but this method is subject to the risk of interference and possible repetitions. Limited time precision and the possibility of multiple IDs being created at the same time also reduce the reliability of this approach.

Designing your own ID generation method

To generate unique identifiers, an approach was chosen that combines several methods to achieve the required uniqueness. This principle is used in UUID and has proven its effectiveness. A simple option for generating identifiers only through Random was discarded because it does not provide sufficient collision protection. More robust methods (e.g., SecureRandom), minimize this risk, but are slower. Using only random numbers essentially replicates the idea of ​​UUIDv4.

The generation is based on the device time, which allows to avoid ID duplication when restarting the process – each new timestamp almost always provides a monotonous growth of values. To reduce the risks associated with possible changes in the system time, the timestamp is supplemented with a random number similar to the “clock sequence” in UUID. This random number is generated once at startup, so it can be created in any way without affecting the speed of formation of each ID. An incremental counter was also added to prevent collisions during simultaneous ID generation.

To optimize the process, a timestamp and a random number are generated once when the application is launched. This data is used as the ID prefix. The timestamp is reduced to 40 bits (about 30 years), and the “clock sequence” takes 24 bits. This way, the prefix fits into one Long (64 bits) Each time a new ID is requested, only the counter is incremented.

Structure of the new ID

Structure of the new ID

As a result, the new ID, called LocalID, consists of:

Within a single application launch, collisions are excluded thanks to the counter. After a restart, the probability of coincidences is minimal, since the timestamp at launch will almost always be unique. Even if the timestamps match, 24 bits of the random number (about 16 million variants) will exclude duplication.

LocalID example:

LocalID generation code (method toCompactString() will be further in the article):

private val counter = AtomicLong(INITIAL_COUNTER_VALUE)
private val mostSigBits = generateMSB()
private val prefix = mostSigBits.toCompactString() + "-"


private fun generateMSB(): Long {
    val random24Bits = SecureRandom().nextLong() and 0xFFFFFFL
    val currentTimeMillis = System.currentTimeMillis() and 0xFFFFFFFFFFL
    return (currentTimeMillis shl 24) or random24Bits
}


fun newId() = LocalId(
    mostSigBits = mostSigBits,
    leastSigBits = counter.getAndIncrement(),
)


fun newIdString(): String {
    return prefix + counter.getAndIncrement().toCompactString()
}

Final Performance Comparison

As can be seen from the graphs below, the LocalID generation method showed higher speed compared to UUID and GUID. Its performance was close to incremental generation via AtomicLong.

Comparison of ID generation time with LocalID

Comparison of ID generation time with LocalID

LocalID makes only 3 allocations when generating a string, while different versions of UUID require more than 20:

LocalID was also compared with UUID.randomUUID() on different data volumes. LocalID outperforms UUID.randomUUID() in terms of time and allocations by approximately 10 times.

Comparison of LocalId generation time with UUID.randomUUID() with different number of iterations

Comparison of LocalId generation time with UUID.randomUUID() with different number of iterations

For HashMap It is important to distribute IDs evenly across buckets to minimize collisions and speed up data access. For UUIDs, the coefficient of variation of distribution across 1024 buckets with 1 million objects is 3.13% for rows and 3.17% for objects. For LocalIDs, these figures are lower: 2.15% for rows and 0.05% for objects, indicating a more even distribution of LocalIDs.

Distribution by buckets

Distribution by buckets

The bucket index was determined using the algorithm from HashMap:

val hash = key.hashCode() xor (key.hashCode() ushr 16)
val bucketIndex = (bucketCount - 1) and hash

The code for calculating the hashcode of a LocalID object is similar to that used in UUID:

(mostSigBits xor leastSigBits).hashCode()

IDs generated in a single run have a perfect distribution due to the fact that IDs differ by 1. Objects from different runs approach a uniform distribution.

The LocalID string generation code uses a special character set that provides a good distribution. Although it can be improved, it will require significant effort. The more characters in the set, the shorter the string representation and the faster the hash code is calculated, which helps improve the efficiency of the work. HashMap.

private const val CHARS = "BJmkQCdLHlPobfKZsiuRqDwtzINTWOYA"
private const val MASK = 0x1FL
private const val BASE32_MAX_LENGTH = 13


fun Long.toCompactString(): String {
    val result = CharArray(BASE32_MAX_LENGTH)
    var number = this
    var index = BASE32_MAX_LENGTH - 1


    do {
        result[index--] = CHARS[(number and MASK).toInt()]
        number = number ushr 5
    } while (number != 0L)


    return String(result, index + 1, BASE32_MAX_LENGTH - index - 1)
}

Overall generation speed hashcode() LocalID is comparable to UUIDv4 for both object ID and string ID, with a slight advantage for string LocalID due to its shorter length.

For simple tasks, it is worth using UUIDv4. It is better to use your own ID only in critical parts of the application. If possible, use identifiers provided by the backend. If the data is stored in a database, it is preferable to use incremental identifiers from it. If the database is not used, and the data is needed only for other layers of the application, it is worth using LocalID. UUID is suitable for situations when data is sent to the server, for example, for analytics or to ensure idempotency of requests. At the same time, you can coordinate your own version of UUID with the backend to improve performance by pre-generating some parts of the UUID.

Optimization paths and further research

Further topics to explore include:

Links

Conclusion

The study examined different methods for generating unique identifiers with an emphasis on their use in mobile development. Based on the uniqueness and performance requirements, the LocalID method was created, which uses a combination of time, a random number, and an incremental counter. LocalID showed significant speed advantages over UUID and GUID, making it an effective solution for mobile applications.

If you have any questions or suggestions, I would appreciate your comments.

Special thanks to colleagues from Wildberries for constructive criticism of the first implementation of LocalID, which led to its improvements.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *