Introduction to Hash Functions

Kristian · February 1, 2018, 8:20pm

Security is a central consideration in utilizing the power of the internet, especially when it comes to designing and implementing applications of blockchain technology. As blockchain continues to become an ever more prevalent part of digital infrastructure and technology, it seems important to have a basic understanding of the security measures in place to protect the value of assets and information stored in isolated nodes around the world.

So let’s discuss some of it now:

Any good digital security mechanism relies heavily on hash functions to encrypt, verify, and decode secure information as it travels around the internet. There are many different types of hash functions; Bitcoin, along with most other modern internet applications uses the popular SHA-256 framework created by the NSA in 2001 to hash the data on its blockchain. SHA-256 is part of a larger system of 6 hash functions, all of which belong to the SHA-2 family, which replaced SHA-1 after it was found to be insufficiently secure for the modern world. The two hash function families are essentially identical in their fundamental structure, with SHA-2 being more secure as a result of utilizing larger block sizes to store and process more data.

But how does a hash function work?

Simply put, a hash function takes in data of varying length and runs it through an algorithm that outputs a hash code — also often called the hash digest — which is a unique string of letters and numbers that is always the same length (in the case of SHA-256, the hash digest is 64 characters long).

A good hash function has five defining properties:

It is relatively easy to compute: a computer can determine the hash digest in seconds.
It is practically irreversible: you cannot reasonably take the hash digest, and determine what the original message was.
It is consistent: the same input will always generate the same output.
It amplifies small differences in the input to produce a totally unique output regardless of whether the two inputs differ by 1 character or 10.
It is practically impossible to find two inputs that give the same output.

The relative strength of a hash function depends on how well it adheres to each of the five primary properties laid out above. I have used the words “practically” and “reasonably” several times as a result of the fact that any hash function can be cracked by guess and check given enough time and computer power. By today’s standards though, it would take trillions of dollars and hundreds of years to break SHA-256, which, by all practical standards, is “secure enough.”

A hash function which satisfies the above principles sufficiently, such as SHA-256, is deployed to encrypt secure information across the web. In the case of blockchain technology, and specifically the Bitcoin network, hash functions are used to encrypt wallet information, transaction data, private and public keys, block headers, etc. In a larger digital context, hash functions can be used to create and verify the integrity of digital signatures, to quickly check whether any historical data has been compromised, and to generate pseudorandom numbers.

If you’re interested in learning more about cryptography, I recommend heading over to Wikipedia with a cup of tea and reading anything you can find on the topic. I typically start on one general page such as “cryptographic hash functions” and keep clicking more specific links until I’ve had enough.

And as always, leave any questions or comments below.