# Hash Functions Code Tutorial

2021 June 10 Twitter See all posts

## Hash Functions in Python

Python has built-in hash functions. We will use them to learn about how hash functions work.

By default, Python's hash function adds a random number to hash inputs to prevent attackers from taking advantage of known collisions. Hashing the same phrase will not return the same value if you call the program at different times. When calling the python script in the command line, add PYTHONHASHSEED=0 before the command to disable this feature. Example: "PYTHONHASHSEED=0 python3 /users/austinvernon/desktop/scripts/python_hash_function.py"

## General Hash Functions

Verify basic features of hash functions.

1. Hashing a given input always results in the same hash value.

2. Hash values are fast to compute.

3. Different inputs have different hash values.

``````import time

start = time.time()
print(hash("susan"))
print(hash("charlie"))
print(hash("susan"))
end = time.time()
print("elapsed time: ",end-start," seconds")
``````

Output:

``````    ">-5124793730868099537"
">-7631605457190692542"
">-5124793730868099537"
">elapsed time: 6.99x10^-5" seconds
``````

The hash values for "susan" are the same, while "charlie" is different. The elapsed time to run all three hashes was a fraction of a second.

### Data Retrieval

Python dictionaries use hash tables to store values. We will compare a list lookup (linear search) vs. a dictionary lookup (hash table).

The following script has a short list and dictionary:

``````    import time

dictionary = {"charlie":3, "susan":4, "beth":5 }
list = ["sally", "bob", "joe", "charlie", "susan", "beth"]

#lookups
start = time.time()
print(dictionary["beth"])
end = time.time()
print("Dictionary Lookup Time: ",end-start, " seconds")

start = time.time()
print(list.index("beth"))
end = time.time()
print("List Lookup Time: ",end-start, " seconds")
``````

When I run the script several times, list lookup is faster. The overhead of hashing the input overwhelms the speed advantages of a hash table lookup for short data sets.

Let's try it again with a longer list. I added code to generate the new list (remember to indent the loop body if copy/paste breaks it):

``````    import time
import random
import string

#set variables
dictionary = {}
list = []
i=0

#loop to create random list and dictionary
while i < 10000:  #change list size here
letters = string.ascii_lowercase
phrase = ''.join(random.choice(letters) for j in range(10))
list.append(phrase)
dictionary[phrase] = i
i += 1

#append our targets to end of list and dictionary
list.append("beth")
dictionary["beth"] = i

#lookups
start = time.time()
print(dictionary["beth"])
end = time.time()
print("Dictionary Lookup Time: ",end-start, " seconds")

start = time.time()
print(list.index("beth"))
end = time.time()
print("List Lookup Time: ",end-start, " seconds")
``````

I increased the list size by 10x incrementally until I found a crossover at 10,000. As the list size increases, the lookup time for the dictionary remains the same while the list lookup time increases. Put in different x values for "i < x" to change the list size and explore the relationship.

### Integrity

The basic idea of checking one file or phrase is inherent to general hash functions, as shown in the "General Hash Functions" section.

### Merkle Trees

We will hash a set of data blocks into a Merkle Tree, then change one block to see how it propagates through the tree. For reference, the Merkle Tree structure: Source: Wikipedia.com

The first tree:

``````    #Pair 1
#data block 1
hash_value_00 = hash('bob')
print("Hash Value 00: ",hash_value_00)
#data block 2
hash_value_01 = hash('alice')
print("Hash Value 01: ",hash_value_01)

#Pair 2
#data block 3
hash_value_10 = hash('sam')
print("Hash Value 10: ",hash_value_10)
#data block 4
hash_value_11 = hash('charlie')
print("Hash Value 11: ",hash_value_11)

#Level2 Hash Values
hash_value_0 = hash(str(hash_value_00) + str(hash_value_01))
print("Hash Value 0: ",hash_value_0)
hash_value_1 = hash(str(hash_value_10) + str(hash_value_11))
print("Hash Value 1: ",hash_value_1)

#root hash/top hash/ Merkle Root
top_hash_value = hash(str(hash_value_1) + str(hash_value_0))

print("Top Hash Value: ",top_hash_value)
``````

Output:

``````        Hash Value 00:  -4208795585541223758
Hash Value 01:  8512114763159807471
Hash Value 10:  1019084245346364989
Hash Value 11:  -7631605457190692542
Hash Value 0:  -4765654072729701387
Hash Value 1:  -4212900652853811053
Top Hash Value:  -8281923323636112442
``````

Let's use the same code, except change the string in "data block 2" from "alice" to "alicia".

Output:

``````        Hash Value 00:  -4208795585541223758
Hash Value 01:  -4059417228946341628
Hash Value 10:  1019084245346364989
Hash Value 11:  -7631605457190692542
Hash Value 0:  2860753682179309060
Hash Value 1:  -4212900652853811053
Top Hash Value:  3683710842423746868
``````

To find the bad data block given the top hash value, you request the next layer of the hash tree. Since only "hash value 0" differs, you can skip checking the "hash value 1" branch of the tree. Next, request the two hash values that formed "hash value 0". You will realize that "hash value 01" is the culprit. The Merkle Tree makes this hunt faster because you can skip parts of the tree. Searching a Merkle Tree is a variation of the binary search topic in computer science. Like using hash tables, the larger the original data set, the more advantage Merkle Trees have.

## Crytographic Hash Functions

One of the most commonly used cryptographic hash functions is SHA-256, part of the SHA-2 family of hash functions. Python has a library for using SHA-2 functions, but it is not as user-friendly as calling the simple "hash()" function.

The SHA256 function requires inputs to be encoded since it does not directly hash plaintext. Outputs are converted back into a human-readable format.

``````    import hashlib

input = "sally sells seashells"

# encoding then hashing
result = hashlib.sha256(input.encode())

# To print, we have to convert the sha256 object to hexadecimal
print(result.hexdigest())
``````

Result:

``````      >bc1161cc7e1107e890299ebe4df1495c47d2eef68f46d8d304f68c16b785881b
``````

Notice the hash value is much longer and contains characters other than numbers. The range of possible values helps prevent collisions.

### Proof of Work

Let's execute a proof-of-work exercise similar to what Bitcoin uses. Bitcoin uses SHA-256 as its hash function. The idea is to find a hash value with a certain number of leading zeros. The input to the hash function is the hash value of all the previous blocks plus a nonce. A nonce is a random number used only once. Here is the code (remember to indent the loop body if copy/paste breaks it):

``````    import hashlib

transaction = "Alice sends 10 AliceCoins to Bob"

transaction_hash = hashlib.sha256(transaction.encode())

print("transaction hash: ",transaction_hash.hexdigest())

#loop to hash the transaction hash with nonces, using i as nonce
i = 0
while i < 11:
pow_input = transaction_hash.hexdigest() + str(i)
pow_hash = hashlib.sha256(pow_input.encode())
print("PoW Hash ",i, ": ",pow_hash.hexdigest())
i += 1
``````

Result:

``````    transaction hash:  641018871210776bb36695ab958f1b3cb4a6a51666b68c4a8d3cf8b397299e52
PoW Hash  0 :  8370f1ce2d2e164b64d12bfa735c09314a04e8d5a55f1a5228dc8b911da86a84
PoW Hash  1 :  dd154ce63969648d33479c3a2ffc00b2c1e43a55e71d250d327abf2afb6a1045
PoW Hash  2 :  9ae7d85bed099e77c55743e598c0cc56a0bf712262a4c57e857cd4b2f08f9ab8