Cryptography

In this notebook, we are going to use Mathematica to implement some simple encryption schemes, and to analyze and decode messages.

Additive Ciphers

One of the simplest codes is an additive cipher, obtained by replacing each letter in a message by the letter exactly n places to the right of it in the alphabet, for some fixed number n. When the end of the alphabet is reached, the count continues from the beginning of the alphabet. For example, the additive cipher corresponding to n = 5 would replace "A" by "F" and "W" by "C". To implement this cipher, we can use Mathematica's built-in string manipulation functions. Since a Mathematica string must be surrounded by double quotes, we can define the string consisting of the letters in the alphabet by:

To encrypt a message, we want to apply a list of replacement rules to a message string. Continuing with the example of the additive cipher for n = 5, we want to use the replacement rules "A" -> "F", "B" -> "G", and so on. It would be tedious to type all these replacement rules, so we will automate the process instead. To create the replacement rule for a given letter, say the ninth letter of the alphabet, we type

The preceding command makes a replacement rule using the 9th letter of the alphabet and the (9+5)th letter of the alphabet. Since there are only 26 letters in the alphabet, we use the Mod command to make sure that numbers larger than 26 wrap around to the beginning of the alphabet.

Now that we have a command that does one letter at a time, we can use the Table command to do the whole alphabet at once.

To proceed, we need a message to encrypt. Here's a simple one.

In order to use the replacement rules, we must put the message in upper case.

Now we encrypt the message by using StringReplace together with the replacement rules in encrypt5.

Finally, we should eliminate the spaces in the message to make it harder to decipher.

We can decrypt the message by applying the opposite set of replacement rules as follows:

Breaking the code

Let's define some functions that will automate the process of encryption and decryption.

Here is a quote that has been encoded using an additive cipher.

How can we decode it? One approach would be to use brute force. Since there are only 26 different additive ciphers, we could simply apply our user-defined function additiveDecode to the message using each of the numbers 1 through 26 as keys and look for an output that makes sense. Here is a more sophisticated approach. It is known that the letter "E" occurs more often than any other letter in the English language. One estimate is that the letter "E" comprises 12.7% of the letters in a typical English text. The next most frequent letter is "T", at 9.1%. If we can figure out which letter in the ciphertext corresponds to the letter "E", then we can decipher the message. Let's analyze the message to see which letter occurs most often.
To do this, we first convert the message to a list (as opposed to a string) and then count the occurrences of each letter.

We see that the most frequent letter in the ciphertext is "R", so we can be reasonably sure that "E" has been encoded as "R". Thus the shift in the additive cipher is 13, the distance between "E "and "R" in the alphabet. Thus if you apply the function additiveDecode[cipherquote, 13], you will see the decoded message.

An additive cipher is so easy to break that no self-respecting cryptanalyst would use it. In 1586, the French diplomat Blaise de Vigenere introduced a more complicated cipher. His idea was to use one additive cipher to encode the first letter of a message, a different additive cipher to encode the second letter, and so on. The main problem is to keep track of which additive cipher to use for each letter. In Vigenere ciphers, this is done by repeatedly writing a keyword above the message, where each letter of the keyword determines the additive cipher to use for the corresponding letter of the message. In spite of the complications introduced by the keyword, a Vigenere cipher can also be attacked by statistical methods, which start by attempting to estimate the length of the keyword.

Public Key Cryptography

Modern methods in cryptography evade elementary statistical analysis by encoding blocks of letters as numbers. They also rely on powerful techniques from the theory of prime numbers and factorization. As our final example, we will discuss the RSA encryption scheme. This method has another interesting aspect: you can transmit part of the key publicly, since the key used to encrypt messages is different from the key used to decrypt them.

This encryption system requires us to choose two large prime numbers. For practical use, the primes should have at least 100 decimal digits; finding such primes is an adventure of its own. For our example, however, we'll let Mathematica generate some smaller prime numbers.

For the method to work, the individual prime numbers need to be kept secret. The product, however, is part of the public key.

Notice that the product has approximately twice as many digits as the individual factors. In order for someone to break the code, they would have to factor this product. The strength of the code relies on the fact that it is still extremely difficult (even with powerful computers) to factor integers with more than 200 digits. Consequently, it is also difficult to compute a number theoretic function called the Euler &phgr; function; to compute it, you need to know the factorization. In our case, we started by choosing p and q ourselves, so we know the value &phgr;(pq). Here it is:

The Euler &phgr; function is built into Mathematica.

The fact that Mathematica can factor pq to compute this value tells us that our example is not secure.

Now we are ready to pick the remainder of the public key. We can choose almost any random integer less than &phgr;. (The number we choose cannot share a common factor with &phgr;.)

Now we can compute the secret part of the key, which will be used to decrypt messages.

The magic that makes everything work is a theorem of Leonhard Euler. This theorem tells us that any number A, when raised to the e*d power, leaves a remainder equal to A upon division by pq. Here are some examples to verify that statement.

Here are the functions that encode and decode messages.

Now let's take a message (in the form of an integer) and encrypt it.

As you can see, the encrypted message gives little clue of the repeating pattern in the original integer. However, it can easily be decrypted.

One step remains before we can use this method to encrypt a message written in standard English. We have to convert the letters into integers that can be encrypted by this method. But that's quite easy on a computer; all those letters are stored as numbers anyway.

For an excellent introduction to cryptography, see A. Beutelspacher, Cryptography, Mathematical Association of America, 1994.