Invertible Hash Function Meet-In-The-Middle Attack
Cracking Invertible Hash Functions
Generally when trying to “crack” a hash you can only work in one direction: from the start of the string to the end.
But what if the hash function is invertible, such that given hash_of_string = hasher(string, seed)
, you can calculate seed = inv_hasher(string, hash_of_string)
?
Then you could also work in reverse and perform a meet-in-the-middle attack.
Jenkins-One-At-A-Time (joaat)
joaat is a non-cryptographic 32-bit hash function:
uint32_t joaat(const uint8_t* key, size_t length, uint32_t hash)
{
size_t i = 0;
while (i != length) {
hash += key[i++];
hash += hash << 10;
hash ^= hash >> 6;
}
hash += hash << 3;
hash ^= hash >> 11;
hash += hash << 15;
return hash;
}
While it may not look like it at first, all of the operations performed are invertible:
hash += key[i++]
This one is pretty obvious: hash -= key[--i]
hash += hash « n
hash += hash << n
is equivalent to hash *= (1 << n) + 1
.
x = (1 << n) + 1
is an odd number, and all odd numbers have a modular multiplicative inverse mod 232. The inverse can be calculated in Python using y = pow(x, -1, 2**32)
.
n | x=2n+1 | y=x-1 (mod 232) |
---|---|---|
3 | 0x9 | 0x38E38E39 |
10 | 0x401 | 0xC00FFC01 |
15 | 0x8001 | 0x3FFF8001 |
Alternatively, we can avoid using multiplication, instead using a series of subtractions and additions to cancel out the multiple of the hash.
If we first perform hash -= hash << n
, we get hash = hash + (hash << n) - ((hash + (hash << n)) << n)
.
This simplifies to hash -= (hash << (2*n))
.
From here we can then perform a series of hash += (hash << n)
operations, doubling n
each time until n >= 32
.
For n = 3
, we get:
hash -= (hash << 3);
hash += (hash << 6);
hash += (hash << 12);
hash += (hash << 24);
Similarly, for n = 10
, we get:
hash -= (hash << 10);
hash += (hash << 20);
This optionally expands/simplifies to hash += (hash << 20) - (hash << 10) - (hash << 30)
And, for n = 15
we get:
hash -= (hash << 15);
hash += (hash << 30);
This optionally expands/simplifies to hash += (hash << 30) - (hash << 15)
hash ^= hash » n
When performing hash ^= hash >> n
, the upper n
bits remain unchanged.
If we perform the same operation again, we get hash = (hash ^ (hash >> n)) ^ ((hash ^ (hash >> n)) >> n)
.
This simplifies to hash ^= (hash >> (2*n))
.
From here we can just apply the same formula again, until n >= 32
.
For n = 6
, we get:
hash ^= (hash >> 6);
hash ^= (hash >> 12);
hash ^= (hash >> 24);
This optionally expands/simplifies to hash ^= (hash >> 6) ^ (hash >> 12) ^ (hash >> 18) ^ (hash >> 24) ^ (hash >> 30)
Similarly, for n = 11
, we get:
hash ^= (hash >> 11);
hash ^= (hash >> 22);
This optionally expands/simplifies to hash ^= (hash >> 11) ^ (hash >> 22)
For some more information on inverting joaat, see this SO post.
Inverse Hash Function
Combining all this knowledge, we can create the inverse hash function:
uint32_t inv_joaat(const uint8_t* key, size_t length, uint32_t hash)
{
size_t i = length;
// invert hash += hash << 15;
hash *= 0x3FFF8001;
// invert hash ^= hash >> 11;
hash ^= (hash >> 11) ^ (hash >> 22);
// invert hash += hash << 3;
hash *= 0x38E38E39;
while (i != 0) {
// invert hash ^= hash >> 6;
hash ^= (hash >> 6) ^ (hash >> 12) ^ (hash >> 18) ^ (hash >> 24) ^ (hash >> 30);
// invert hash += hash << 10;
hash *= 0xC00FFC01;
// invert hash += key[i++];
hash -= key[--i];
}
return hash;
}
Using these functions, we can calculate hash_of_key = joaat(key, length, seed)
and seed = inv_joaat(key, length, hash_of_key)
Partial Hashing
Currently we can’t hash the string in stages, but that’s easy to fix by separating the final part of the hash into a separate function:
uint32_t joaat_loop(const uint8_t* key, size_t length, uint32_t hash)
{
size_t i = 0;
while (i != length) {
hash += key[i++];
hash += hash << 10;
hash ^= hash >> 6;
}
return hash;
}
uint32_t joaat_final(uint32_t hash)
{
hash += hash << 3;
hash ^= hash >> 11;
hash += hash << 15;
return hash;
}
uint32_t joaat(const uint8_t* key, size_t length, uint32_t hash)
{
return joaat_final(joaat_loop(key, length, hash));
}
uint32_t inv_joaat_loop(const uint8_t* key, size_t length, uint32_t hash)
{
size_t i = length;
while (i != 0) {
hash ^= (hash >> 6) ^ (hash >> 12) ^ (hash >> 18) ^ (hash >> 24) ^ (hash >> 30);
hash *= 0xC00FFC01;
hash -= key[--i];
}
return hash;
}
uint32_t inv_joaat_final(uint32_t hash)
{
hash *= 0x3FFF8001;
hash ^= (hash >> 11) ^ (hash >> 22);
hash *= 0x38E38E39;
return hash;
}
uint32_t inv_joaat(const uint8_t* key, size_t length, uint32_t hash)
{
return inv_joaat_loop(key, length, inv_joaat_final(hash));
}
Any target hashes can simply have inv_joaat_final
applied as a pre-processing step.
With these changes we can now hash strings in stages. For example:
hash_of_foobar = hasher("foobar", seed)
can be split into:
hash_of_foo = hasher("foo", seed)
and hash_of_foobar = hasher("bar", hash_of_foo)
And the same works in reverse.
seed = inv_hasher("foobar", 6, hash_of_foobar)
can be split into:
hash_of_foo = inv_hasher("bar", hash_of_foobar)
and seed = inv_hasher("foo", hash_of_foo)
Observe that we can now compute hash_of_foo
from both directions.
Meet-In-The-Middle Attack
Now that we can hash in both directions, it’s time to perform a meet-in-the-middle attack by pre-computing a table of possible suffixes (or prefixes).
For example, say we have seed = 0x00000000
and target_hash = 0xDA5F1F08
(using joaat).
First, apply target_hash = inv_joaat_final(target_hash) = 0xB2FAEB40
.
Next, we are going to try and find any matches with the following format: (foo|bar|baz)(spam|ham|eggs)(alpha|beta|gamma)[0-9]
.
Instead of using a naive search, we will compute the hashes of (foo|bar|baz)(spam|ham|eggs)
, and inverse hashes of (alpha|beta|gamma)[0-9]
, then try and match them together.
Prefix Computation
First, compute the hashes of (foo|bar|baz)
:
uint32_t hashes_0[3] = {
0x9290584E, // = hasher("foo", seed) | foo
0x9B85A7ED, // = hasher("bar", seed) | bar
0x9B85C665, // = hasher("baz", seed) | baz
};
Next, compute the hashes of (spam|ham|eggs)
for each of hashes in hashes_0
:
uint32_t hashes_1[3 * 3] = {
0x326FF91C, // = hasher("spam", hashes_0[0]) | foospam
0xA2D2706D, // = hasher("spam", hashes_0[1]) | barspam
0x5E82F68E, // = hasher("spam", hashes_0[2]) | bazspam
0x50AD347B, // = hasher("ham", hashes_0[0]) | fooham
0xFFCD395B, // = hasher("ham", hashes_0[1]) | barham
0x4390BB7E, // = hasher("ham", hashes_0[2]) | bazham
0x9B900B39, // = hasher("eggs", hashes_0[0]) | fooeggs
0x76AB7F74, // = hasher("eggs", hashes_0[1]) | bareggs
0x61E13087, // = hasher("eggs", hashes_0[2]) | bazeggs
};
Suffix Computation
First, compute the inverse hashes of [0-9]
:
uint32_t hashes_3[10] = {
0xBCFCCF1D, // = inv_hasher("0", target_hash) | 0
0xBCFCCF1C, // = inv_hasher("1", target_hash) | 1
0xBCFCCF1B, // = inv_hasher("2", target_hash) | 2
0xBCFCCF1A, // = inv_hasher("3", target_hash) | 3
0xBCFCCF19, // = inv_hasher("4", target_hash) | 4
0xBCFCCF18, // = inv_hasher("5", target_hash) | 5
0xBCFCCF17, // = inv_hasher("6", target_hash) | 6
0xBCFCCF16, // = inv_hasher("7", target_hash) | 7
0xBCFCCF15, // = inv_hasher("8", target_hash) | 8
0xBCFCCF14, // = inv_hasher("9", target_hash) | 9
};
Next, compute the inverse hashes of (alpha|beta|gamma)
for each of the hashes in hashes_3
:
uint32_t hashes_2[3 * 10] = {
0x44B7CA4F, // = inv_hasher("alpha", hashes_3[0]) | alpha0
0xD5DC41E6, // = inv_hasher("alpha", hashes_3[1]) | alpha1
0xF0724F33, // = inv_hasher("alpha", hashes_3[2]) | alpha2
0x92478EFC, // = inv_hasher("alpha", hashes_3[3]) | alpha3
0x441F4B0B, // = inv_hasher("alpha", hashes_3[4]) | alpha4
0x1E505441, // = inv_hasher("alpha", hashes_3[5]) | alpha5
0x5E82F68E, // = inv_hasher("alpha", hashes_3[6]) | alpha6
0x10DDE129, // = inv_hasher("alpha", hashes_3[7]) | alpha7
0xDC9517D6, // = inv_hasher("alpha", hashes_3[8]) | alpha8
0x68A87890, // = inv_hasher("alpha", hashes_3[9]) | alpha9
0x02645373, // = inv_hasher("beta", hashes_3[0]) | beta0
0x79F25E08, // = inv_hasher("beta", hashes_3[1]) | beta1
0x3BF5E6E1, // = inv_hasher("beta", hashes_3[2]) | beta2
0xAD889649, // = inv_hasher("beta", hashes_3[3]) | beta3
0x78AC4C24, // = inv_hasher("beta", hashes_3[4]) | beta4
0x679BF0A5, // = inv_hasher("beta", hashes_3[5]) | beta5
0x98FFAF13, // = inv_hasher("beta", hashes_3[6]) | beta6
0x329BD300, // = inv_hasher("beta", hashes_3[7]) | beta7
0x69A818A3, // = inv_hasher("beta", hashes_3[8]) | beta8
0x0CA5BAB8, // = inv_hasher("beta", hashes_3[9]) | beta9
0x7E61C3A2, // = inv_hasher("gamma", hashes_3[0]) | gamma0
0x58CCF7F6, // = inv_hasher("gamma", hashes_3[1]) | gamma1
0x2E9CABEE, // = inv_hasher("gamma", hashes_3[2]) | gamma2
0x8454125A, // = inv_hasher("gamma", hashes_3[3]) | gamma3
0x236DF238, // = inv_hasher("gamma", hashes_3[4]) | gamma4
0x81606B74, // = inv_hasher("gamma", hashes_3[5]) | gamma5
0x8FBDD74C, // = inv_hasher("gamma", hashes_3[6]) | gamma6
0x2F4F9E57, // = inv_hasher("gamma", hashes_3[7]) | gamma7
0x71FE275A, // = inv_hasher("gamma", hashes_3[8]) | gamma8
0xD3A812AF, // = inv_hasher("gamma", hashes_3[9]) | gamma9
};
Finding Matches
Now, if we compare the tables hashes_1
and hashes_2
, there is one match: 0x5E82F68E
. This corresponds to a prefix of bazspam
and a suffix of alpha6
. Join those together, and indeed joaat("bazspamalpha6") = 0xDA5F1F08
, our target hash!
Instead of the 270 (3*3*3*10
) hashes or 12600 ((3+3+3)*(4+3+3)*(5+4+5)*10
) characters of a naive search, it only took us
39 ((3*3)+(3*10)
) hashes or 230 (((3+3+3)*(4+3+3))+((5+4+5)*10)
) characters, in exchange for just 9 table lookups and some extra RAM.
Note, In this case we computed the full table of hashes for both the prefixes and suffixes. Algorithmically speaking, you would only benefit from pre-computing either one of them (and then using that to construct some data structure with faster than O(n)
lookups), with the speedup being linear to the size of pre-computed table.
Also note, this can easily be applied to finding matches for multiple hashes at once.
You can view C code for the above example here.
Conclusion
Since the speedup is linear to the number of pre-computed hashes, a meet-in-the-middle attack can be many orders of magnitude faster than a naive search, given enough RAM.
When used with a 32-bit hash (such as joaat) with the intent of finding the original string, the bottleneck very quickly becomes dealing with the amount of hash collisions, not the speed at which valid strings are found.
You can find a full implementation of this technique for joaat here.