Cracking Invertible Hash Functions

Generally when trying to “crack” a hash you can only work in one direction: from the start of the string to the end.
But what if the hash function is invertible, such that given hash_of_string = hasher(string, seed), you can calculate seed = inv_hasher(string, hash_of_string)?
Then you could also work in reverse and perform a meet-in-the-middle attack.

Jenkins-One-At-A-Time (joaat)

joaat is a non-cryptographic 32-bit hash function:

uint32_t joaat(const uint8_t* key, size_t length, uint32_t hash)
{
    size_t i = 0;

    while (i != length) {
        hash += key[i++];
        hash += hash << 10;
        hash ^= hash >> 6;
    }

    hash += hash << 3;
    hash ^= hash >> 11;
    hash += hash << 15;

    return hash;
}

While it may not look like it at first, all of the operations performed are invertible:

hash += key[i++]

This one is pretty obvious: hash -= key[--i]

hash += hash « n

hash += hash << n is equivalent to hash *= (1 << n) + 1.
x = (1 << n) + 1 is an odd number, and all odd numbers have a modular multiplicative inverse mod 232. The inverse can be calculated in Python using y = pow(x, -1, 2**32).

n x=2n+1 y=x-1 (mod 232)
3 0x9 0x38E38E39
10 0x401 0xC00FFC01
15 0x8001 0x3FFF8001

Alternatively, we can avoid using multiplication, instead using a series of subtractions and additions to cancel out the multiple of the hash.
If we first perform hash -= hash << n, we get hash = hash + (hash << n) - ((hash + (hash << n)) << n).
This simplifies to hash -= (hash << (2*n)).
From here we can then perform a series of hash += (hash << n) operations, doubling n each time until n >= 32.

For n = 3, we get:

hash -= (hash << 3);
hash += (hash << 6);
hash += (hash << 12);
hash += (hash << 24);

Similarly, for n = 10, we get:

hash -= (hash << 10);
hash += (hash << 20);

This optionally expands/simplifies to hash += (hash << 20) - (hash << 10) - (hash << 30)

And, for n = 15 we get:

hash -= (hash << 15);
hash += (hash << 30);

This optionally expands/simplifies to hash += (hash << 30) - (hash << 15)

hash ^= hash » n

When performing hash ^= hash >> n, the upper n bits remain unchanged.

If we perform the same operation again, we get hash = (hash ^ (hash >> n)) ^ ((hash ^ (hash >> n)) >> n).
This simplifies to hash ^= (hash >> (2*n)).
From here we can just apply the same formula again, until n >= 32.

For n = 6, we get:

hash ^= (hash >> 6);
hash ^= (hash >> 12);
hash ^= (hash >> 24);

This optionally expands/simplifies to hash ^= (hash >> 6) ^ (hash >> 12) ^ (hash >> 18) ^ (hash >> 24) ^ (hash >> 30)

Similarly, for n = 11, we get:

hash ^= (hash >> 11);
hash ^= (hash >> 22);

This optionally expands/simplifies to hash ^= (hash >> 11) ^ (hash >> 22)

For some more information on inverting joaat, see this SO post.

Inverse Hash Function

Combining all this knowledge, we can create the inverse hash function:

uint32_t inv_joaat(const uint8_t* key, size_t length, uint32_t hash)
{
    size_t i = length;

    // invert hash += hash << 15;
    hash *= 0x3FFF8001;

    // invert hash ^= hash >> 11;
    hash ^= (hash >> 11) ^ (hash >> 22);

    // invert hash += hash << 3;
    hash *= 0x38E38E39;

    while (i != 0) {
        // invert hash ^= hash >> 6;
        hash ^= (hash >> 6) ^ (hash >> 12) ^ (hash >> 18) ^ (hash >> 24) ^ (hash >> 30);

        // invert hash += hash << 10;
        hash *= 0xC00FFC01;

        // invert hash += key[i++];
        hash -= key[--i];
    }

    return hash;
}

Using these functions, we can calculate hash_of_key = joaat(key, length, seed) and seed = inv_joaat(key, length, hash_of_key)

Partial Hashing

Currently we can’t hash the string in stages, but that’s easy to fix by separating the final part of the hash into a separate function:

uint32_t joaat_loop(const uint8_t* key, size_t length, uint32_t hash)
{
    size_t i = 0;

    while (i != length) {
        hash += key[i++];
        hash += hash << 10;
        hash ^= hash >> 6;
    }

    return hash;
}

uint32_t joaat_final(uint32_t hash)
{
    hash += hash << 3;
    hash ^= hash >> 11;
    hash += hash << 15;

    return hash;
}

uint32_t joaat(const uint8_t* key, size_t length, uint32_t hash)
{
    return joaat_final(joaat_loop(key, length, hash));
}

uint32_t inv_joaat_loop(const uint8_t* key, size_t length, uint32_t hash)
{
    size_t i = length;

    while (i != 0) {
        hash ^= (hash >> 6) ^ (hash >> 12) ^ (hash >> 18) ^ (hash >> 24) ^ (hash >> 30);
        hash *= 0xC00FFC01;
        hash -= key[--i];
    }

    return hash;
}

uint32_t inv_joaat_final(uint32_t hash)
{
    hash *= 0x3FFF8001;
    hash ^= (hash >> 11) ^ (hash >> 22);
    hash *= 0x38E38E39;
    return hash;
}

uint32_t inv_joaat(const uint8_t* key, size_t length, uint32_t hash)
{
    return inv_joaat_loop(key, length, inv_joaat_final(hash));
}

Any target hashes can simply have inv_joaat_final applied as a pre-processing step.

With these changes we can now hash strings in stages. For example:
hash_of_foobar = hasher("foobar", seed) can be split into:
hash_of_foo = hasher("foo", seed) and hash_of_foobar = hasher("bar", hash_of_foo)

And the same works in reverse.
seed = inv_hasher("foobar", 6, hash_of_foobar) can be split into:
hash_of_foo = inv_hasher("bar", hash_of_foobar) and seed = inv_hasher("foo", hash_of_foo)

Observe that we can now compute hash_of_foo from both directions.

Meet-In-The-Middle Attack

Now that we can hash in both directions, it’s time to perform a meet-in-the-middle attack by pre-computing a table of possible suffixes (or prefixes).

For example, say we have seed = 0x00000000 and target_hash = 0xDA5F1F08 (using joaat).

First, apply target_hash = inv_joaat_final(target_hash) = 0xB2FAEB40.

Next, we are going to try and find any matches with the following format: (foo|bar|baz)(spam|ham|eggs)(alpha|beta|gamma)[0-9].
Instead of using a naive search, we will compute the hashes of (foo|bar|baz)(spam|ham|eggs), and inverse hashes of (alpha|beta|gamma)[0-9], then try and match them together.

Prefix Computation

First, compute the hashes of (foo|bar|baz):

uint32_t hashes_0[3] = {
    0x9290584E, // = hasher("foo", seed) | foo
    0x9B85A7ED, // = hasher("bar", seed) | bar
    0x9B85C665, // = hasher("baz", seed) | baz
};

Next, compute the hashes of (spam|ham|eggs) for each of hashes in hashes_0:

uint32_t hashes_1[3 * 3] = {
    0x326FF91C, // = hasher("spam", hashes_0[0]) | foospam
    0xA2D2706D, // = hasher("spam", hashes_0[1]) | barspam
    0x5E82F68E, // = hasher("spam", hashes_0[2]) | bazspam
    0x50AD347B, // = hasher("ham",  hashes_0[0]) | fooham
    0xFFCD395B, // = hasher("ham",  hashes_0[1]) | barham
    0x4390BB7E, // = hasher("ham",  hashes_0[2]) | bazham
    0x9B900B39, // = hasher("eggs", hashes_0[0]) | fooeggs
    0x76AB7F74, // = hasher("eggs", hashes_0[1]) | bareggs
    0x61E13087, // = hasher("eggs", hashes_0[2]) | bazeggs
};

Suffix Computation

First, compute the inverse hashes of [0-9]:

uint32_t hashes_3[10] = {
    0xBCFCCF1D, // = inv_hasher("0", target_hash) | 0
    0xBCFCCF1C, // = inv_hasher("1", target_hash) | 1
    0xBCFCCF1B, // = inv_hasher("2", target_hash) | 2
    0xBCFCCF1A, // = inv_hasher("3", target_hash) | 3
    0xBCFCCF19, // = inv_hasher("4", target_hash) | 4
    0xBCFCCF18, // = inv_hasher("5", target_hash) | 5
    0xBCFCCF17, // = inv_hasher("6", target_hash) | 6
    0xBCFCCF16, // = inv_hasher("7", target_hash) | 7
    0xBCFCCF15, // = inv_hasher("8", target_hash) | 8
    0xBCFCCF14, // = inv_hasher("9", target_hash) | 9
};

Next, compute the inverse hashes of (alpha|beta|gamma) for each of the hashes in hashes_3:

uint32_t hashes_2[3 * 10] = {
    0x44B7CA4F, // = inv_hasher("alpha", hashes_3[0]) | alpha0
    0xD5DC41E6, // = inv_hasher("alpha", hashes_3[1]) | alpha1
    0xF0724F33, // = inv_hasher("alpha", hashes_3[2]) | alpha2
    0x92478EFC, // = inv_hasher("alpha", hashes_3[3]) | alpha3
    0x441F4B0B, // = inv_hasher("alpha", hashes_3[4]) | alpha4
    0x1E505441, // = inv_hasher("alpha", hashes_3[5]) | alpha5
    0x5E82F68E, // = inv_hasher("alpha", hashes_3[6]) | alpha6
    0x10DDE129, // = inv_hasher("alpha", hashes_3[7]) | alpha7
    0xDC9517D6, // = inv_hasher("alpha", hashes_3[8]) | alpha8
    0x68A87890, // = inv_hasher("alpha", hashes_3[9]) | alpha9
    0x02645373, // = inv_hasher("beta",  hashes_3[0]) | beta0
    0x79F25E08, // = inv_hasher("beta",  hashes_3[1]) | beta1
    0x3BF5E6E1, // = inv_hasher("beta",  hashes_3[2]) | beta2
    0xAD889649, // = inv_hasher("beta",  hashes_3[3]) | beta3
    0x78AC4C24, // = inv_hasher("beta",  hashes_3[4]) | beta4
    0x679BF0A5, // = inv_hasher("beta",  hashes_3[5]) | beta5
    0x98FFAF13, // = inv_hasher("beta",  hashes_3[6]) | beta6
    0x329BD300, // = inv_hasher("beta",  hashes_3[7]) | beta7
    0x69A818A3, // = inv_hasher("beta",  hashes_3[8]) | beta8
    0x0CA5BAB8, // = inv_hasher("beta",  hashes_3[9]) | beta9
    0x7E61C3A2, // = inv_hasher("gamma", hashes_3[0]) | gamma0
    0x58CCF7F6, // = inv_hasher("gamma", hashes_3[1]) | gamma1
    0x2E9CABEE, // = inv_hasher("gamma", hashes_3[2]) | gamma2
    0x8454125A, // = inv_hasher("gamma", hashes_3[3]) | gamma3
    0x236DF238, // = inv_hasher("gamma", hashes_3[4]) | gamma4
    0x81606B74, // = inv_hasher("gamma", hashes_3[5]) | gamma5
    0x8FBDD74C, // = inv_hasher("gamma", hashes_3[6]) | gamma6
    0x2F4F9E57, // = inv_hasher("gamma", hashes_3[7]) | gamma7
    0x71FE275A, // = inv_hasher("gamma", hashes_3[8]) | gamma8
    0xD3A812AF, // = inv_hasher("gamma", hashes_3[9]) | gamma9
};

Finding Matches

Now, if we compare the tables hashes_1 and hashes_2, there is one match: 0x5E82F68E. This corresponds to a prefix of bazspam and a suffix of alpha6. Join those together, and indeed joaat("bazspamalpha6") = 0xDA5F1F08, our target hash!

Instead of the 270 (3*3*3*10) hashes or 12600 ((3+3+3)*(4+3+3)*(5+4+5)*10) characters of a naive search, it only took us 39 ((3*3)+(3*10)) hashes or 230 (((3+3+3)*(4+3+3))+((5+4+5)*10)) characters, in exchange for just 9 table lookups and some extra RAM.

Note, In this case we computed the full table of hashes for both the prefixes and suffixes. Algorithmically speaking, you would only benefit from pre-computing either one of them (and then using that to construct some data structure with faster than O(n) lookups), with the speedup being linear to the size of pre-computed table.

Also note, this can easily be applied to finding matches for multiple hashes at once.

You can view C code for the above example here.

Conclusion

Since the speedup is linear to the number of pre-computed hashes, a meet-in-the-middle attack can be many orders of magnitude faster than a naive search, given enough RAM.
When used with a 32-bit hash (such as joaat) with the intent of finding the original string, the bottleneck very quickly becomes dealing with the amount of hash collisions, not the speed at which valid strings are found.

You can find a full implementation of this technique for joaat here.