The top search result on Encryption vs Encoding vs Hashing is wrong

Posted by stefan@boesen.me

It's always interesting to Google introductory subjects in a field you know about and read what pops up. Recently I did just that on Encryption vs Encoding vs Hashing, which is a common interview question I used when I hired at Amazon and Anvil.

I was pretty shocked to find the top result was in part wrong: https://danielmiessler.com/study/encoding-encryption-hashing-obfuscation/

While I did a short tour of duty on the Amazon cryptography team my focus is more on web security; that being said, I know when I read something with crypto and it's wrong. Daniel knows his stuff (I met Daniel in ~2016 when we both worked at the Security consulting firm IOActive, and he's sharp) so I'm sure if it's just poorly phrased, but the first sentence in the hashing section is downright wrong:

Hashing serves the purpose of ensuring integrity, i.e. making it so that if something is changed you can know that it’s changed.


Hashing's primary purpose is not integrity. Don't believe me? Look at Wikipedia:

A hash function is any function that can be used to map data of arbitrary size to fixed-size values.

Here's another relevant section talking about use cases of hashes:

Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval, and require an amount of storage space only fractionally greater than the total space required for the data or records themselves.

Interestingly no mention of integrity, or security use. That's because hash functions don't inherently have security properties; those are specifically Cryptographic Hashes, which are a specific subset of hashes that have specific properties (namely, they're hard to reverse, and hard to generate a hash with a specific hash value or duplicate hash values).

This is what Daniel describes as properties of all hashes, which isn't true; CRC32 is a very fast hashing algorithm which has more collisions than a modern algorithm, but has next to no security value. That doesn't mean it's useless, just not for security. If you want to use it as a more in depth "error correcting code" (even though it is a hashing algorithm despite not meeting the described criteria).

A subset of Cryptographic Hashes is itself Key Derivation Functions, used by modern encryption tools to turn a human password or pass phrase into an encryption key.

I might expand on this later, but I was surprised and wanted to offer a resource with a little better explanation here for newcomers that might not catch the difference at first. After all, I've filed plenty of security bugs myself where people used a hashing algorithm meant for integrity instead of a password hashing algorithm, so even professionals make these mistakes!