Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
Write a function to compute the edit distance/Hamming distance between two strings.
For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.
The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.
Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
Solve each block as if it was single-character XOR. You already have code to do this. For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.
This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR ("Vigenere") statistically is obviously an academic exercise, a "Crypto 101" thing. But more people "know how" to break it than can actually break it, and a similar technique breaks something much more important.
Here is the idea:
First we have to find out the lenght of the key. This is the purpose of the steps using the "edit distance".
When we have the key length, we are able to group together ciphertext bytes that share the same key byte.
This result in "blocks" each being a "single-character XOR" which we know how to break since challenge 3.
The most complicated part is probably then the technique to find the key lenght. Why it works is not obvious (at least not to me?), but how it's done is given step by step so we can just follow these steps.
There's a file 6.txt below. It's been base64'd after being encrypted with repeating-key XOR.
HUIfTQsPAh9PE048GmllH0kcDk4TAQsHThsBFkU2AB4BSWQgVB0dQzNTTmVS BgBHVBwNRU0HBAxTEjwMHghJGgkRTxRMIRpHKwAFHUdZEQQJAGQmB1MANxYG DBoXQR0BUlQwXwAgEwoFR08SSAhFTmU+Fgk4RQYFCBpGB08fWXh+amI2DB0P QQ1IBlUaGwAdQnQEHgFJGgkRAlJ6f0kASDoAGhNJGk9FSA8dDVMEOgFSGQEL QRMGAEwxX1NiFQYHCQdUCxdBFBZJeTM1CxsBBQ9GB08dTnhOSCdSBAcMRVhI CEEATyBUCHQLHRlJAgAOFlwAUjBpZR9JAgJUAAELB04CEFMBJhAVTQIHAh9P G054MGk2UgoBCVQGBwlTTgIQUwg7EAYFSQ8PEE87ADpfRyscSWQzT1QCEFMa TwUWEXQMBk0PAg4DQ1JMPU4ALwtJDQhOFw0VVB1PDhxFXigLTRkBEgcKVVN4 Tk9iBgELR1MdDAAAFwoFHww6Ql5NLgFBIg4cSTRWQWI1Bk9HKn47CE8BGwFT QjcEBx4MThUcDgYHKxpUKhdJGQZZVCFFVwcDBVMHMUV4LAcKQR0JUlk3TwAm HQdJEwATARNFTg5JFwQ5C15NHQYEGk94dzBDADsdHE4UVBUaDE5JTwgHRTkA Umc6AUETCgYAN1xGYlUKDxJTEUgsAA0ABwcXOwlSGQELQQcbE0c9GioWGgwc AgcHSAtPTgsAABY9C1VNCAINGxgXRHgwaWUfSQcJABkRRU8ZAUkDDTUWF01j OgkRTxVJKlZJJwFJHQYADUgRSAsWSR8KIgBSAAxOABoLUlQwW1RiGxpOCEtU YiROCk8gUwY1C1IJCAACEU8QRSxORTBSHQYGTlQJC1lOBAAXRTpCUh0FDxhU ZXhzLFtHJ1JbTkoNVDEAQU4bARZFOwsXTRAPRlQYE042WwAuGxoaAk5UHAoA ZCYdVBZ0ChQLSQMYVAcXQTwaUy1SBQsTAAAAAAAMCggHRSQJExRJGgkGAAdH MBoqER1JJ0dDFQZFRhsBAlMMIEUHHUkPDxBPH0EzXwArBkkdCFUaDEVHAQAN U29lSEBAWk44G09fDXhxTi0RAk4ITlQbCk0LTx4cCjBFeCsGHEETAB1EeFZV IRlFTi4AGAEORU4CEFMXPBwfCBpOAAAdHUMxVVUxUmM9ElARGgZBAg4PAQQz DB4EGhoIFwoKUDFbTCsWBg0OTwEbRSonSARTBDpFFwsPCwIATxNOPBpUKhMd Th5PAUgGQQBPCxYRdG87TQoPD1QbE0s9GkFiFAUXR0cdGgkADwENUwg1DhdN AQsTVBgXVHYaKkg7TgNHTB0DAAA9DgQACjpFX0BJPQAZHB1OeE5PYjYMAg5M FQBFKjoHDAEAcxZSAwZOBREBC0k2HQxiKwYbR0MVBkVUHBZJBwp0DRMDDk5r NhoGACFVVWUeBU4MRREYRVQcFgAdQnQRHU0OCxVUAgsAK05ZLhdJZChWERpF QQALSRwTMRdeTRkcABcbG0M9Gk0jGQwdR1ARGgNFDRtJeSchEVIDBhpBHQlS WTdPBzAXSQ9HTBsJA0UcQUl5bw0KB0oFAkETCgYANlVXKhcbC0sAGgdFUAIO ChZJdAsdTR0HDBFDUk43GkcrAAUdRyonBwpOTkJEUyo8RR8USSkOEENSSDdX RSAdDRdLAA0HEAAeHQYRBDYJC00MDxVUZSFQOV1IJwYdB0dXHRwNAA9PGgMK OwtTTSoBDBFPHU54W04mUhoPHgAdHEQAZGU/OjV6RSQMBwcNGA5SaTtfADsX GUJHWREYSQAnSARTBjsIGwNOTgkVHRYANFNLJ1IIThVIHQYKAGQmBwcKLAwR DB0HDxNPAU94Q083UhoaBkcTDRcAAgYCFkU1RQUEBwFBfjwdAChPTikBSR0T TwRIEVIXBgcURTULFk0OBxMYTwFUN0oAIQAQBwkHVGIzQQAGBR8EdCwRCEkH ElQcF0w0U05lUggAAwANBxAAHgoGAwkxRRMfDE4DARYbTn8aKmUxCBsURVQf DVlOGwEWRTIXFwwCHUEVHRcAMlVDKRsHSUdMHQMAAC0dCAkcdCIeGAxOazkA BEk2HQAjHA1OAFIbBxNJAEhJBxctDBwKSRoOVBwbTj8aQS4dBwlHKjUECQAa BxscEDMNUhkBC0ETBxdULFUAJQAGARFJGk9FVAYGGlMNMRcXTRoBDxNPeG43 TQA7HRxJFUVUCQhBFAoNUwctRQYFDE43PT9SUDdJUydcSWRtcwANFVAHAU5T FjtFGgwbCkEYBhlFeFsABRcbAwZOVCYEWgdPYyARNRcGAQwKQRYWUlQwXwAg ExoLFAAcARFUBwFOUwImCgcDDU5rIAcXUj0dU2IcBk4TUh0YFUkASEkcC3QI GwMMQkE9SB8AMk9TNlIOCxNUHQZCAAoAHh1FXjYCDBsFABkOBkk7FgALVQRO D0EaDwxOSU8dGgI8EVIBAAUEVA5SRjlUQTYbCk5teRsdRVQcDhkDADBFHwhJ AQ8XClJBNl4AC1IdBghVEwARABoHCAdFXjwdGEkDCBMHBgAwW1YnUgAaRyon B0VTGgoZUwE7EhxNCAAFVAMXTjwaTSdSEAESUlQNBFJOZU5LXHQMHE0EF0EA Bh9FeRp5LQdFTkAZREgMU04CEFMcMQQAQ0lkay0ABwcqXwA1FwgFAk4dBkIA CA4aB0l0PD1MSQ8PEE87ADtbTmIGDAILAB0cRSo3ABwBRTYKFhROHUETCgZU MVQHYhoGGksABwdJAB0ASTpFNwQcTRoDBBgDUkksGioRHUkKCE5THEVCC08E EgF0BBwJSQoOGkgGADpfADETDU5tBzcJEFMLTx0bAHQJCx8ADRJUDRdMN1RH YgYGTi5jMURFeQEaSRAEOkURDAUCQRkKUmQ5XgBIKwYbQFIRSBVJGgwBGgtz RRNNDwcVWE8BT3hJVCcCSQwGQx9IBE4KTwwdASEXF01jIgQATwZIPRpXKwYK BkdEGwsRTxxDSToGMUlSCQZOFRwKUkQ5VEMnUh0BR0MBGgAAZDwGUwY7CBdN HB5BFwMdUz0aQSwWSQoITlMcRUILTxoCEDUXF01jNw4BTwVBNlRBYhAIGhNM EUgIRU5CRFMkOhwGBAQLTVQOHFkvUkUwF0lkbXkbHUVUBgAcFA0gRQYFCBpB PU8FQSsaVycTAkJHYhsRSQAXABxUFzFFFggICkEDHR1OPxoqER1JDQhNEUgK TkJPDAUAJhwQAg0XQRUBFgArU04lUh0GDlNUGwpOCU9jeTY1HFJARE4xGA4L ACxSQTZSDxsJSw1ICFUdBgpTNjUcXk0OAUEDBxtUPRpCLQtFTgBPVB8NSRoK SREKLUUVAklkERgOCwAsUkE2Ug8bCUsNSAhVHQYKUyI7RQUFABoEVA0dWXQa Ry1SHgYOVBFIB08XQ0kUCnRvPgwQTgUbGBwAOVREYhAGAQBJEUgETgpPGR8E LUUGBQgaQRIaHEshGk03AQANR1QdBAkAFwAcUwE9AFxNY2QxGA4LACxSQTZS DxsJSw1ICFUdBgpTJjsIF00GAE1ULB1NPRpPLF5JAgJUVAUAAAYKCAFFXjUe DBBOFRwOBgA+T04pC0kDElMdC0VXBgYdFkU2CgtNEAEUVBwTWXhTVG5SGg8e AB0cRSo+AwgKRSANExlJCBQaBAsANU9TKxFJL0dMHRwRTAtPBRwQMAAATQcB FlRlIkw5QwA2GggaR0YBBg5ZTgIcAAw3SVIaAQcVEU8QTyEaYy0fDE4ITlhI Jk8DCkkcC3hFMQIEC0EbAVIqCFZBO1IdBgZUVA4QTgUWSR4QJwwRTWM=
Decrypt it.
Here's how:
import base64
def get_english_score(input_bytes):
"""Compares each input byte to a character frequency
chart and returns the score of a message based on the
relative frequency the characters occur in the English
language.
"""
# From https://en.wikipedia.org/wiki/Letter_frequency
# with the exception of ' ', which I estimated.
character_frequencies = {
'a': .08167, 'b': .01492, 'c': .02782, 'd': .04253,
'e': .12702, 'f': .02228, 'g': .02015, 'h': .06094,
'i': .06094, 'j': .00153, 'k': .00772, 'l': .04025,
'm': .02406, 'n': .06749, 'o': .07507, 'p': .01929,
'q': .00095, 'r': .05987, 's': .06327, 't': .09056,
'u': .02758, 'v': .00978, 'w': .02360, 'x': .00150,
'y': .01974, 'z': .00074, ' ': .13000
}
return sum([character_frequencies.get(chr(byte), 0) for byte in input_bytes.lower()])
def single_char_xor(input_bytes, char_value):
"""Returns the result of each byte being XOR'd with a single value.
"""
output_bytes = b''
for byte in input_bytes:
output_bytes += bytes([byte ^ char_value])
return output_bytes
def bruteforce_single_char_xor(ciphertext):
"""Performs a singlechar xor for each possible value(0,255), and
assigns a score based on character frequency. Returns the result
with the highest score.
"""
potential_messages = []
for key_value in range(256):
message = single_char_xor(ciphertext, key_value)
score = get_english_score(message)
data = {
'message': message,
'score': score,
'key': key_value
}
potential_messages.append(data)
return sorted(potential_messages, key=lambda x: x['score'], reverse=True)[0]
def break_repeating_key_xor(ciphertext):
"""Attempts to break repeating-key XOR encryption.
"""
average_distances = []
# Take the keysize from suggested range
for keysize in range(2, 41):
# Initialize list to store Hamming distances for this keysize
distances = []
# Break the ciphertext into chunks the length of the keysize
chunks = [ciphertext[i:i + keysize] for i in range(0, len(ciphertext), keysize)]
while True:
try:
# Take the two chunks at the beginning of the list and
# get the Hamming distance
chunk_1 = chunks[0]
chunk_2 = chunks[1]
distance = calculate_hamming_distance(chunk_1, chunk_2)
# Normalize this result by dividing by KEYSIZE
distances.append(distance / keysize)
# Remove these chunks so when the loop starts over, the
# Hamming distance for the next two chunks can be calculated
del chunks[0]
del chunks[1]
# When an exception occurs (indicating all chunks have
# been processed) break out of the loop.
except Exception as e:
break
result = {
'key': keysize,
'avg distance': sum(distances) / len(distances)
}
average_distances.append(result)
possible_key_lengths = sorted(average_distances, key=lambda x: x['avg distance'])[0]
possible_plaintext = []
# Will populate with a single character as each transposed
# block has been single-byte XOR brute forced
key = b''
possible_key_length = possible_key_lengths['key']
for i in range(possible_key_length):
# Creates an block made up of each nth byte, where n
# is the keysize
block = b''
for j in range(i, len(ciphertext), possible_key_length):
block += bytes([ciphertext[j]])
key += bytes([bruteforce_single_char_xor(block)['key']])
possible_plaintext.append((repeating_key_xor(ciphertext, key), key))
return max(possible_plaintext, key=lambda x: get_english_score(x[0]))
def repeating_key_xor(message_bytes, key):
"""Returns message XOR'd with a key. If the message, is longer
than the key, the key will repeat.
"""
output_bytes = b''
index = 0
for byte in message_bytes:
output_bytes += bytes([byte ^ key[index]])
if (index + 1) == len(key):
index = 0
else:
index += 1
return output_bytes
def calculate_hamming_distance(input_bytes_1, input_bytes_2):
"""Finds and returns the Hamming distance (number of differing
bits) between two byte-strings
"""
hamming_distance = 0
for b1, b2 in zip(input_bytes_1, input_bytes_2):
difference = b1 ^ b2
# Count the number of differences ('1's) and add to the hamming distance
hamming_distance += sum([1 for bit in bin(difference) if bit == '1'])
return hamming_distance
def main():
with open('6.txt') as input_file:
ciphertext = base64.b64decode(input_file.read())
result, key = break_repeating_key_xor(ciphertext)
print("Key: {}\nMessage: {}".format(key, result))
if __name__ == '__main__':
main()
After break repeating key xor we are:
key: b'Terminator X: Bring the noise' message: I'm back and I'm ringin' the bell A rockin' on the mike while the fly girls yell In ecstasy in the back of me Well that's my DJ Deshay cuttin' all them Z's Hittin' hard and the girlies goin' crazy Vanilla's on the mike, man I'm not lazy. I'm lettin' my drug kick in It controls my mouth and I begin To just let it flow, let my concepts go My posse's to the side yellin', Go Vanilla Go! Smooth 'cause that's the way I will be And if you don't give a damn, then Why you starin' at me So get off 'cause I control the stage There's no dissin' allowed I'm in my own phase The girlies sa y they love me and that is ok And I can dance better than any kid n' play Stage 2 -- Yea the one ya' wanna listen to It's off my head so let the beat play through So I can funk it up and make it sound good 1-2-3 Yo -- Knock on some wood For good luck, I like my rhymes atrocious Supercalafragilisticexpialidocious I'm an effect and that you can bet I can take a fly girl and make her wet. I'm like Samson -- Samson to Delilah There's no denyin', You can try to hang But you'll keep tryin' to get my style Over and over, practice makes perfect But not if you're a loafer. You'll get nowhere, no place, no time, no girls Soon -- Oh my God, homebody, you probably eat Spaghetti with a spoon! Come on and say it! VIP. Vanilla Ice yep, yep, I'm comin' hard like a rhino Intoxicating so you stagger like a wino So punks stop trying and girl stop cryin' Vanilla Ice is sellin' and you people are buyin' 'Cause why the freaks are jockin' like Crazy Glue Movin' and groovin' trying to sing along All through the ghetto groovin' this here song Now you're amazed by the VIP posse. Steppin' so hard like a German Nazi Startled by the bases hittin' ground There's no trippin' on mine, I'm just gettin' down Sparkamatic, I'm hangin' tight like a fanatic You trapped me once and I thought that You might have it So step down and lend me your ear '89 in my time! You, '90 is my year. You're weakenin' fast, YO! and I can tell it Your body's gettin' hot, so, so I can smell it So don't be mad and don't be sad 'Cause the lyrics belong to ICE, You can call me Dad You're pitchin' a fit, so step back and endure Let the witch doctor, Ice, do the dance to cure So come up close and don't be square You wanna battle me -- Anytime, anywhere You thought that I was weak, Boy, you're dead wrong So come on, everybody and sing this song Say -- Play that funky music Say, go white boy, go white boy go play that funky music Go white boy, go white boy, go Lay down and boogie and play that funky music till you die. Play that funky music Come on, Come on, let me hear Play that funky music white boy you say it, say it Play that funky music A little louder now Play that funky music, white boy Come on, Come on, Come on Play that funky music

Aucun commentaire:
Enregistrer un commentaire