Auto-incrementing integer IDs are often used as unique row identifiers in many databases, whether specified explicitly by the user or implicitly by the database engine. These sequential IDs may be a source of information leakage, allowing third-parties to infer statistics about or simply enumerate your data.
This is not always a bad thing, and may in fact be intuitive in many contexts involving naturally ordered data. For example, I think it makes sense for my CI builds to be numbered sequentially (after all, they are already tied to a unique commit identifier).
Nevertheless, sometimes you just do not want to expose sequential IDs (even if only because larger numbers or short strings look nicer). While not a replacement for proper access control and verification, one option is to switch to a non-sequential ID generation algorithm such as UUIDs. Alternatively, and the topic of this post, sequential IDs can be obfuscated to appear non-sequential.
Obfuscation methods
Hashids
One library for doing this which has been in fashion recently is Hashids, which maps
integers to short strings like yr8
or 3kTMd
, inspired by the IDs used by YouTube and URL shorteners.
Hashids - generate short unique ids from integersGenerate short unique ids from integers. Use in url shortening or as unique ids.https://hashids.org/
Hashids has some nice features such as trying not to generate strings which contain common English curse words, but it
At the same time, Hashids makes no guarantees of security. There exists at least one algorithm to reverse obfuscated hashids that performs better than brute force, presented in a blog post in which the author claims that anyone using Hashids should consider it fully-reversible:
Cryptanalysis of hashidsCarnage's tech talkhttp://carnage.github.io/2015/08/cryptanalysis-of-hashids
Hashids does have some nice features such as trying not to generate strings which contain common English curse words, and a wide range of implementations across many programming languages. Depending on your threat model, it could be a convenient way to prettify your integer IDs.
Block ciphers
Another way to obfuscate IDs is through symmetric encryption. An ID is encrypted to obtain a corresponding opaque identifier be exposed, and decrypted before use. This can be done using a block cipher since IDs usually have a fixed length.
The Perl Crypt::Skip32
package has existed since 2007 and mentions scrambling database IDs as one of its main
use cases. It adapts the Skipjack cipher to 32-bit block sizes,
and thus works with 32-bit integer database IDs.
Crypt::Skip32 - 32-bit block cipher based on Skipjack - metacpan.org32-bit block cipher based on Skipjackhttps://metacpan.org/pod/Crypt::Skip32
Nowadays it's not uncommon to see 64-bit database IDs as well, and common block ciphers with a 64-bit block size like triple DES or Blowfish can be directly used to obfuscate these without modification.
Security wise, it seems to me that such an obfuscation method is directly tied to the strength of the encryption used, and reversing it should entail breaking the underlying cipher. However, besides the key size and algorithm itself, the security of a block cipher is also a function of the block size.
While small block sizes are more efficient for encrypting similarly sized IDs, they are also more vulnerable to collision or birthday attacks, as demonstrated against some widely-used block ciphers:
Sweet32: Birthday attacks on 64-bit block ciphers in TLS and OpenVPNSweet32: Birthday attacks on 64-bit block ciphers in TLS and OpenVPNhttps://sweet32.info/
Fortunately though, the attack above takes advantage of specific block cipher modes of operation, specifically how CBC mode collisions reveal the exclusive-or of two plaintext blocks, and does not apply to obfuscating individual IDs that fit in a single block, which is effectively ECB mode instead.
Others
Some other constructions for a reversible ID obfuscation function besides block ciphers:
- Optimus, mapping integers to other integers based on Knuth's integer hash:
Jens Segers - Id transformation with OptimusThis unbelievable small and fast algorithm will generate random-like integers with the ability to convert them back to the original value.https://jenssegers.com/67/id-transformation-with-optimus
- A custom combination of various methods including xor, bit shuffling and base conversion, described in this StackOverflow answer.
algorithm - Obfuscating an ID - Stack OverflowI'm looking for a way to encrypt/obfuscate an integer ID into another integer.https://stackoverflow.com/a/8555047
- The hasty pudding cipher, technically designed as a block cipher but more of an academic curiosity, which has a customisable block size.
Such bespoke solutions may have the advantage of being a better fit for their specific use cases.
Presents: gift wrap your IDs
Just for fun, I created my own library for obfuscating integer IDs based on block ciphers.
yi-jiayu/presentsLike hashids, but based on block ciphershttps://github.com/yi-jiayu/presents
Similar to Hashids, it converts positive integers to short strings. It first encrypts the integer with a block cipher, then converts the encrypted integer to a string by changing its base and encoding it into a custom alphabet.
Presents is implemented in Go and can use any 64-bit block cipher implementing the standard
library cipher.Block
interface. The character set for the output
string can be customised, and optionally further shuffled with a specific random seed as well. It converts an integer
like 1213486160
to 90NyXHLckhA
, although it doesn't do anything fancy to prevent curse words from being generated.
The name comes from the PRESENT block cipher, which is currently the default block cipher used by Presents. We had to implement PRESENT in school in our security class and I wrote versions of it in both Go and Rust, inspiring me to write this library as well. However, I do want to change the default block cipher to a more widely-used one like triple DES or Blowfish.