md5 is a hash function, and hash functions are designed to have two properties:
1) they are hiding. You (theoretically) can't reverse the function by any method other than brute-force.
2) they are *binding. You (theoretically) can't find any other input that hashes to the same output by any method other than brute-force.
Any tool that "decrypts" md5 hashes most likely does so by generating what is called a rainbow table -- a giant list of many possible inputs, and the hashes they generate. If you look at the spreadsheet and find a hash from your rainbow table, voila, you know what it came from. To make it harder to use rainbow tables, any security-conscious site will "salt" the passwords before hashing them, by adding a random string prefix. The point is for the random "salt" to be different for each password you are hashing, so a standard (unsalted) rainbow table won't work, and further, the same rainbow table won't work for every password.
(md5 itself has been shown to be vulnerable to collision attacks, which is why I said "theoretically")
Okay thank you for the explanation. Let me try and apply my rudimentary knowledge here...
So a hash function is used to encrypt data by translating it with a certain rule set--I've learned about a simple key%b type function before. But with md5 this hashing function isn't the same each time a new code is created? How is the system able to decode it then? _Something_ out there has to know how to translate that back into a readable string right?
And collision is when different strings end up with the same encrypted code (except if you use a hash chain structure). So how is this used in an attack?
Sorry for all the questions. I know I could probably google this but I always learn better through instruction. Thanks!
"_Something_ out there has to know how to translate that back into a readable string right?"
Wrong. That's exactly your misunderstanding - MD5 is not an encryption function, but a hashing function.
The way it works is, given some string, it will output a new, random-looking string. It's impossible to go backwards, i.e. given the output of running MD5, you can't tell the input.
In a nutshell, The way password authentication works is this: when you sign up to a site, a hash of your password is saved. At this point no one, not even the site itself, can tell what your password was.
When you want to log in, you send the password over to the site, they hash it again, and compare the output with the saved hash. If you put in the same password, the hash will come out the same. And it's very, very hard to find a different string which isn't your password which will get you the same hash output.
>How is the system able to decode it then? _Something_ out there has to know how to translate that back into a readable string right?
Wrong. Password hashes are meant to be one-way and chosen specifically so that getting plaintext (readable string) from the ciphertext (hashed gibberish) is very hard. When you create an account, the plain text for your password is hashed and stored. When you want to subsequently login, the system only needs to use the exact same hashing steps and see if they produce an identical hash to the one stored.
Things are done this way specifically so that if a compromise such as the one at gawker happens, it is harder for the attacker to get people's actual passwords.
This is the primary difference between encryption, where you want to be able to recover the plaintext and hashing, where you want to make it very hard to recover the plaintext.
1) they are hiding. You (theoretically) can't reverse the function by any method other than brute-force.
2) they are *binding. You (theoretically) can't find any other input that hashes to the same output by any method other than brute-force.
Any tool that "decrypts" md5 hashes most likely does so by generating what is called a rainbow table -- a giant list of many possible inputs, and the hashes they generate. If you look at the spreadsheet and find a hash from your rainbow table, voila, you know what it came from. To make it harder to use rainbow tables, any security-conscious site will "salt" the passwords before hashing them, by adding a random string prefix. The point is for the random "salt" to be different for each password you are hashing, so a standard (unsalted) rainbow table won't work, and further, the same rainbow table won't work for every password.
(md5 itself has been shown to be vulnerable to collision attacks, which is why I said "theoretically")