CDR: A cool idea that didn't hold up under cryptanalysis.

Wed Sep 20 17:49:34 PDT 2000

Recently, I've been making and breaking pencil-and-paper 
ciphers specifically to acquaint myself with the art of 
cryptanalysis.  

I've developed methods for solving vignere, general 
transpositions, playfair, etc...  just basically bringing 
myself up to speed with the general background of the field. 

And then, with my brain sharpened by the classics, I've been 
returning to things I formerly thought were "really neat 
ideas" in crypto but couldn't formulate a good reason for 
why they would be secure, and demolishing them.  

On most of them, the attacks have turned out to be embarassingly 
simple;  On this one, although I found an attack where known
plaintexts or chosen plaintexts could demolish it, to a 
cryptanalyst guessing cribs it's pretty resistant (it requires 
depressingly long cribs of a depressingly specific form unlikely 
to be found at random).

Anyway, any fans of pencil-and-paper systems out there, enjoy!

				Bear

	The Overlapping Additive Nomenclator

(Pencil-and-paper cipher; believed secure against amateur
  pencil-and-paper cryptanalysts but insecure against 
  computer-assisted or really good pencil-and-paper 
  cryptanalysts -- falls under the heading of "a cool 
  idea that didn't hold up under cryptanalysis")

This is a pencil-and-paper cipher, moderately difficult to 
use and extremely difficult to solve by pencil-and-paper 
cryptanalysis. 

I specifically do not assert that it is computationally 
secure.  The cryptanalysis explaining why not follows 
at the bottom of this paper. 

In this cipher, the key is a list of numbers, each number 
corresponding to one cleartext symbol. The following property 
must hold true of these numbers.

For some number N, (N=2 in the following example)
One of the two following conditions must hold:
   Either:
	All numbers used must differ from one another 
	in first N digits.  If the first difference 
	is in the Nth digit, then the difference in 
	that digit must be at least two.
   Or:
	All numbers used must differ from one another 
        in either one of the last two digits.

(In the example, both conditions are true). 

For some number D, (D=3 in the following example), 
Each code number must be at least N x D in length.  
The cipher provides sufficient confusion to "cover up" 
runs of identical plaintext characters of length D 
or smaller -- runs larger than length D will start 
showing a discernible pattern in the ciphertext. 

For a key on a small alphabet, consider the following:

A 129754
B 148943
D 208531
E 259146
F 268953

Now, if we are to encrypt the message "DEADBEEF" we would 
proceed as follows: 

First, the key number is written lined up beneath each 
plaintext letter, with N columns of digits for each column 
of letters; with a leading-difference system, the numbers 
are written starting in the column of the corresponding 
letter:  With a trailing-difference system, the numbers 
are written ending in the column of the corresponding letter. 

D E A D B E E F         leading difference
     D E A D B E E F    trailing difference
208531  148943
  259146  259146 
    129754  259146
      208531  268953

Next, the numbers in each column are added, with any carries 
spilling over to the next column;

208531  148943
  259146  259146
    129754  259146
      208531  268953
--------------------
21113564544660643553

The sum is the ciphertext. 

Decryption proceeds as follows:  

If we have chosen "Leading difference" as our condition, then 
we look at the first six digits of the sum and determine which 
code number is closest to it but not over.  We subtract 
that value from the whole, recovering a new sum.  We write the 
plaintext symbol corresponding to the number we have just 
subtracted in place of the leading two digits, as the new sum 
will be two digits shorter than the old.  Repeating the process 
until completion, we get the transformation in the left column.

If on the other hand we have chosen "trailing difference" as our 
criterion, then we look at the trailing two digits, pick the 
cleartext letter corresponding to these digits, and subtract it
from the result.  The difference will end in two zeros, which 
we remove and replace by the cleartext symbol whose code value we 
have just subtracted.  Repeating the process until completion, 
we obtain the transformation in the right column. 

21113564544660643553           21113564544660643553
208531                                       268953
--------------------           --------------------
D 260464544660643553           211135645446603746 F
  259146                                   259146
--------------------           --------------------
D E 1318544660643553           2111356454463446 E F
    129754                               259146
--------------------           --------------------
D E A 21004660643553           21113564542043 E E F
      208531                           148943
--------------------           --------------------
D E A D 151560643553           211135643931 B E E F
        148943                       208531
--------------------           --------------------
D E A D B 2617643553           2111354354 D B E E F
          259146                   129754
--------------------           --------------------
D E A D B E 26183553           21112246 A D B E E F
            259146               259146
--------------------           --------------------
D E A D B E E 268953           208531 E A D B E E F
              268953           208531
--------------------           --------------------
D E A D B E E F                     D E A D B E E F

In a real system, we could have chosen "leading difference" or 
"trailing difference" and made the key such that it would have 
been unusable in the other direction. 

With either mode, we can use code numbers of varying numbers 
of digits; the requirement of a constant number of digits 
holds only for systems such as the above which may be solved 
in either direction.  With a leading-difference system, the 
correct method is that the numbers be lined up with their 
leading digits under the letter before the addition is done; 
with a trailing-difference system, it is important that the 
numbers be lined up with their trailing digits under the 
letter before the addition is done. 

Since the "trailing difference" system need not worry about 
a carry digit altering the key, more symbols may be included 
in its alphabet.  

I have used N=2 here as the range of sensitivity, thinking 
of the roman alphabet plus perhaps seventy code signs.  If 
you prefer a nomenclator-like alphabet of nine thousand code 
signs, the same technique may be used; you will need to use 
N=4 instead.  

I have also used a "digit overlap" or D of three here, meaning 
that each digit of the ciphertext is the result of adding 
three digits in a sum, plus a possible carry digit.  Better 
security may be had by increasing the digit overlap (the 
minimum length of each numeric element becomes N times the 
digit overlap) but real computational security cannot be 
achieved by this method; it will frustrate only pencil-and-
paper cipherers.

Finally, since the ciphertext is "open at its ends", it is best 
for security to use a predetermined number of NULL symbols 
at both ends of the cipher; Null code numbers, unlike the actual 
key numbers, need not differ from legitimate key numbers in 
the first or last few digits; they need only to differ in the
first or last few digits from the code numbers assigned to 
other null symbols.   Leading and trailing NULL symbols make 
this cipher quite difficult of solution; if you keep track 
of your supply of leading and trailing nulls and do not reuse 
them, the cipher is reasonably secure against pen-and-paper 
cryptanalysis.

A successful method for pen-and-paper analysis, however, does 
exist.  If a known plaintext enciphered with some key is captured, 
a cryptanalyst may set up linear equations constraining the 
values of the code numbers.   If a set of D+1 consecutive symbols 
is used more than D times in the message under the same key, 
then the linear equations may be solved and will generally 
find the code number corresponding to at least one symbol. 

Once at least a few such code numbers are known, a cryptanalyst 
may start looking for "cribs" consisting of a seqence of length 
D consisting of known symbols, one unknown code symbol, and another 
sequence of D known symbols.  Each time such a crib is found, the 
cryptanalyst will learn another key code number.

A cryptanalyst may also look for "cribs" consisting of a sequence 
of length D consisting of known symbols, two unknown symbols, 
and another sequence of length D of known symbols; each time 
this happens, he will obtain a linear equation constraining the 
values of two symbols.  If more than one linear equation is found 
in this way constraining the value of any two symbols, the key 
for those symbols can be found.  Then of course, the other 
symbols captured in the linear equations with them from other 
cribs may also be solved for. 

What security this cipher has is as a result of the abstracted 
and difficult form that a crib must take, and the difficulty of 
finding and identifying suitable cribs.  

*sigh.*  Oh well, it's better than a standard nomenclator or 
a pigpen cipher.  Polyphony or polyalphabetic code substitution 
could of course be applied to good effect, but that's just 
obfuscation.  

			Bear