Methods for obfuscating sequential IDs

Auto-incrementing integer IDs are often used as unique row identifiers in many databases, whether specified explicitly by the user or implicitly by the database engine. These sequential IDs may be a source of information leakage, allowing third-parties to infer statistics about or simply enumerate your data.

This is not always a bad thing, and may in fact be intuitive in many contexts involving naturally ordered data. For example, I think it makes sense for my CI builds to be numbered sequentially (after all, they are already tied to a unique commit identifier).

Nevertheless, sometimes you just do not want to expose sequential IDs (even if only because larger numbers or short strings look nicer). While not a replacement for proper access control and verification, one option is to switch to a non-sequential ID generation algorithm such as UUIDs. Alternatively, and the topic of this post, sequential IDs can be obfuscated to appear non-sequential.

Obfuscation methods

Hashids

One library for doing this which has been in fashion recently is Hashids, which maps integers to short strings like yr8 or 3kTMd, inspired by the IDs used by YouTube and URL shorteners.

Hashids - generate short unique ids from integers

Generate short unique ids from integers. Use in url shortening or as unique ids.

https://hashids.org/

Hashids has some nice features such as trying not to generate strings which contain common English curse words, but it

At the same time, Hashids makes no guarantees of security. There exists at least one algorithm to reverse obfuscated hashids that performs better than brute force, presented in a blog post in which the author claims that anyone using Hashids should consider it fully-reversible:

Cryptanalysis of hashids

Carnage's tech talk

http://carnage.github.io/2015/08/cryptanalysis-of-hashids

Hashids does have some nice features such as trying not to generate strings which contain common English curse words, and a wide range of implementations across many programming languages. Depending on your threat model, it could be a convenient way to prettify your integer IDs.

Block ciphers

Another way to obfuscate IDs is through symmetric encryption. An ID is encrypted to obtain a corresponding opaque identifier be exposed, and decrypted before use. This can be done using a block cipher since IDs usually have a fixed length.

The Perl Crypt::Skip32 package has existed since 2007 and mentions scrambling database IDs as one of its main use cases. It adapts the Skipjack cipher to 32-bit block sizes, and thus works with 32-bit integer database IDs.

Crypt::Skip32 - 32-bit block cipher based on Skipjack - metacpan.org

32-bit block cipher based on Skipjack

https://metacpan.org/pod/Crypt::Skip32

Nowadays it's not uncommon to see 64-bit database IDs as well, and common block ciphers with a 64-bit block size like triple DES or Blowfish can be directly used to obfuscate these without modification.

Security wise, it seems to me that such an obfuscation method is directly tied to the strength of the encryption used, and reversing it should entail breaking the underlying cipher. However, besides the key size and algorithm itself, the security of a block cipher is also a function of the block size.

While small block sizes are more efficient for encrypting similarly sized IDs, they are also more vulnerable to collision or birthday attacks, as demonstrated against some widely-used block ciphers:

Sweet32: Birthday attacks on 64-bit block ciphers in TLS and OpenVPN

Sweet32: Birthday attacks on 64-bit block ciphers in TLS and OpenVPN

https://sweet32.info/

Fortunately though, the attack above takes advantage of specific block cipher modes of operation, specifically how CBC mode collisions reveal the exclusive-or of two plaintext blocks, and does not apply to obfuscating individual IDs that fit in a single block, which is effectively ECB mode instead.

Others

Some other constructions for a reversible ID obfuscation function besides block ciphers:

Optimus, mapping integers to other integers based on Knuth's integer hash:

Jens Segers - Id transformation with Optimus

This unbelievable small and fast algorithm will generate random-like integers with the ability to convert them back to the original value.

https://jenssegers.com/67/id-transformation-with-optimus

A custom combination of various methods including xor, bit shuffling and base conversion, described in this StackOverflow answer.

algorithm - Obfuscating an ID - Stack Overflow

I'm looking for a way to encrypt/obfuscate an integer ID into another integer.

https://stackoverflow.com/a/8555047

The hasty pudding cipher, technically designed as a block cipher but more of an academic curiosity, which has a customisable block size.

Such bespoke solutions may have the advantage of being a better fit for their specific use cases.

Presents: gift wrap your IDs

Just for fun, I created my own library for obfuscating integer IDs based on block ciphers.

yi-jiayu/presents

Like hashids, but based on block ciphers

https://github.com/yi-jiayu/presents

Similar to Hashids, it converts positive integers to short strings. It first encrypts the integer with a block cipher, then converts the encrypted integer to a string by changing its base and encoding it into a custom alphabet.

Presents is implemented in Go and can use any 64-bit block cipher implementing the standard library cipher.Block interface. The character set for the output string can be customised, and optionally further shuffled with a specific random seed as well. It converts an integer like 1213486160 to 90NyXHLckhA, although it doesn't do anything fancy to prevent curse words from being generated.

The name comes from the PRESENT block cipher, which is currently the default block cipher used by Presents. We had to implement PRESENT in school in our security class and I wrote versions of it in both Go and Rust, inspiring me to write this library as well. However, I do want to change the default block cipher to a more widely-used one like triple DES or Blowfish.