SciAm: The Mystery of the Voynich Manuscript

R. A. Hettinga rah at shipwright.com
Sun Jun 27 19:14:24 PDT 2004


<http://www.sciam.com/print_version.cfm?articleID=0000E3AA-70E1-10CF-AD1983414B7F0000>

Scientific American:

   June 21, 2004

The Mystery of the Voynich Manuscript

New analysis of a famously cryptic medieval document suggests that it
contains nothing but gibberish

By Gordon Rugg


 In 1912 Wilfrid Voynich, an American rare-book dealer, made the find of a
lifetime in the library of a Jesuit college near Rome: a manuscript some
230 pages long, written in an unusual script and richly illustrated with
bizarre images of plants, heavenly spheres and bathing women. Voynich
immediately recognized the importance of his new acquisition. Although it
superficially resembled the handbook of a medieval alchemist or herbalist,
the manuscript appeared to be written entirely in code. Features in the
illustrations, such as hairstyles, suggested that the book was produced
sometime between 1470 and 1500, and a 17th-century letter accompanying the
manuscript stated that it had been purchased by Rudolph II, the Holy Roman
Emperor, in 1586. During the 1600s, at least two scholars apparently tried
to decipher the manuscript, and then it disappeared for nearly 250 years
until Voynich unearthed it.

 Voynich asked the leading cryptographers of his day to decode the odd
script, which did not match that of any known language. But despite 90
years of effort by some of the world's best code breakers, no one has been
able to decipher Voynichese, as the script has become known. The nature and
origin of the manuscript remain a mystery. The failure of the code-breaking
attempts has raised the suspicion that there may not be any cipher to
crack. Voynichese may contain no message at all, and the manuscript may
simply be an elaborate hoax.

 Critics of this hypothesis have argued that Voynichese is too complex to
be nonsense. How could a medieval hoaxer produce 230 pages of script with
so many subtle regularities in the structure and distribution of the words?
But I have recently discovered that one can replicate many of the
remarkable features of Voynichese using a simple coding tool that was
available in the 16th century. The text generated by this technique looks
much like Voynichese, but it is merely gibberish, with no hidden message.
This finding does not prove that the Voynich manuscript is a hoax, but it
does bolster the long-held theory that an English adventurer named Edward
Kelley may have concocted the document to defraud Rudolph II. (The emperor
reportedly paid a sum of 600 ducats--equivalent to about $50,000 today--for
the manuscript.)

 Perhaps more important, I believe that the methods used in this analysis
of the Voynich mystery can be applied to difficult questions in other
areas. Tackling this hoary puzzle requires expertise in several fields,
including cryptography, linguistics and medieval history. As a researcher
into expert reasoning--the study of the processes used to solve complex
problems--I saw my work on the Voynich manuscript as an informal test of an
approach that could be used to identify new ways of tackling long-standing
scientific questions. The key step is determining the strengths and
weaknesses of the expertise in the relevant fields.

 Baby God's Eye?
 The first purported decryption of the Voynich manuscript came in 1921.
William R. Newbold, a professor of philosophy at the University of
Pennsylvania, claimed that each character in the Voynich script contained
tiny pen strokes that could be seen only under magnification and that these
strokes formed an ancient Greek shorthand. Based on his reading of the
code, Newbold declared that the Voynich manuscript had been written by
13th-century philosopher-scientist Roger Bacon and described discoveries
such as the invention of the microscope. Within a decade, however, critics
debunked Newbold's solution by showing that the alleged microscopic
features of the letters were actually natural cracks in the ink.

 The Voynich manuscript appeared to be either an unusual code, an unknown
language or a sophisticated hoax.

Newbold's attempt was just the start of a string of failures. In the 1940s
amateur code breakers Joseph M. Feely and Leonell C. Strong used
substitution ciphers that assigned Roman letters to the characters in
Voynichese, but the purported translations made little sense. At the end of
World War II the U.S. military cryptographers who cracked the Japanese
Imperial Navy's codes passed some spare time tackling
ciphertexts--encrypted texts--from antiquity. The team deciphered every one
except the Voynich manuscript.

 In 1978 amateur philologist John Stojko claimed that the text was written
in Ukrainian with the vowels removed, but his translation--which included
sentences such as "Emptiness is that what Baby God's Eye is fighting
for"--did not jibe with the manuscript's illustrations nor with Ukrainian
history. In 1987 a physician named Leo Levitov asserted that the document
had been produced by the Cathars, a heretical sect that flourished in
medieval France, and was written in a pidgin composed of words from various
languages. Levitov's translation, though, was at odds with the Cathars'
well-documented theology.

 Furthermore, all these schemes used mechanisms that allowed the same
Voynichese word to be translated one way in one part of the manuscript and
a different way in another part. For example, one step in Newbold's
solution involved the deciphering of anagrams, which is notoriously
imprecise: the anagram ADER, for instance, can be interpreted as READ, DARE
or DEAR. Most scholars agree that all the attempted decodings of the
Voynich manuscript are tainted by an unacceptable degree of ambiguity.
Moreover, none of these methods could encode plaintext--that is, a readable
message--into a ciphertext with the striking properties of Voynichese.

 If the manuscript is not a code, could it be an unidentified language?
Even though we cannot decipher the text, we know that it shows an
extraordinary amount of regularity. For instance, the most common words
often occur two or more times in a row. To represent the words, I will use
the European Voynich Alphabet (EVA), a convention for transliterating the
characters of Voynichese into Roman letters. An example from folio 78R of
the manuscript reads: qokedy qokedy dal qokedy qokedy. This degree of
repetition is not found in any known language. Conversely, Voynichese
contains very few phrases where two or three different words regularly
occur together. These characteristics make it unlikely that Voynichese is a
human language--it is simply too different from all other languages.

 The third possibility is that the manuscript was a hoax devised for
monetary gain or that it is some mad alchemist's meaningless ramblings. The
linguistic complexity of the manuscript seems to argue against this theory.
In addition to the repetition of words, there are numerous regularities in
the internal structure of the words. The common syllable qo, for instance,
occurs only at the start of words. The syllable chek may appear at the
start of a word, but if it occurs in the same word as qo, then qo always
comes before chek. The common syllable dy usually appears at the end of a
word and occasionally at the start but never in the middle.

 A simple "pick and mix" hoax that combines the syllables at random could
not produce a text with so many regularities. Voynichese is also much more
complex than anything found in pathological speech caused by brain damage
or psychological disorders. Even if a mad alchemist did construct a grammar
for an invented language and then spent years writing a script that
employed this grammar, the resulting text would not share the various
statistical features of the Voynich manuscript. For example, the word
lengths of Voynichese form a binomial distribution--that is, the most
common words have five or six characters, and the occurrence of words with
greater or fewer characters falls off steeply from that peak in a symmetric
bell curve. This kind of distribution is extremely unusual in a human
language. In almost all human languages, the distribution of word lengths
is broader and asymmetric, with a higher occurrence of relatively long
words. It is very unlikely that the binomial distribution of Voynichese
could have been a deliberate part of a hoax, because this statistical
concept was not invented until centuries after the manuscript was written.

 Expert Reasoning
 In summary, the Voynich manuscript appeared to be either an extremely
unusual code, a strange unknown language or a sophisticated hoax, and there
was no obvious way to resolve the impasse. It so happened that my colleague
Joanne Hyde and I were looking for just such a puzzle a few years ago. We
had been developing a method for critically reevaluating the expertise and
reasoning used in the investigation of difficult research problems. As a
preliminary test, I applied this method to the research on the Voynich
manuscript. I started by determining the types of expertise that had
previously been applied to the problem.

 The assessment that the features of Voynichese are inconsistent with any
human language was based on substantial relevant expertise from
linguistics. This conclusion appeared sound, so I proceeded to the hoax
hypothesis. Most people who have studied the Voynich manuscript agreed that
Voynichese was too complex to be a hoax. I found, however, that this
assessment was based on opinion rather than firm evidence. There is no body
of expertise on how to mimic a long medieval ciphertext, because there are
hardly any examples of such texts, let alone hoaxes of this genre.

 Several researchers, such as Jorge Stolfi of the University of Campinas in
Brazil, had wondered whether the Voynich manuscript was produced using
random text-generation tables. These tables have cells that contain
characters or syllables; the user selects a sequence of cells--perhaps by
throwing dice--and combines them to form a word. This technique could
generate some of the regularities within Voynichese words. Under Stolfi's
method, the table's first column could contain prefix syllables, such as
qo, that occur only at the start of words; the second column could contain
midfixes (syllables appearing in the middle of words) such as chek, and the
third column could contain suffix syllables such as y. Choosing a syllable
from each column in sequence would produce words with the characteristic
structure of Voynichese. Some of the cells might be empty, so that one
could create words lacking a prefix, midfix or suffix.

 English adventurer Edward Kelley may have concocted the document to
defraud Rudolph II, the Holy Roman Emperor.

 Other features of Voynichese, however, are not so easily reproduced. For
instance, some characters are individually common but rarely occur next to
each other. The characters transcribed as a, e and l are common, as is the
combination al, but the combination el is very rare. This effect cannot be
produced by randomly mixing characters from a table, so Stolfi and others
rejected this approach. The key term here, though, is "randomly." To modern
researchers, randomness is an invaluable concept. Yet it is a concept
developed long after the manuscript was created. A medieval hoaxer probably
would have used a different way of combining syllables that might not have
been random in the strict statistical sense. I began to wonder whether some
of the features of Voynichese might be side effects of a long-obsolete
device.

 The Cardan Grille
 It looked as if the hoax hypothesis deserved further investigation. My
next step was to attempt to produce a hoax document to see what side
effects emerged. The first question was, Which techniques to use? The
answer depended on the date when the manuscript was produced. Having worked
in archaeology, a field in which dating artifacts is an important concern,
I was wary of the general consensus among Voynich researchers that the
manuscript was created before 1500. It was illustrated in the style of the
late 1400s, but this attribute did not conclusively pin down the date of
its origin; artistic works are often produced in the style of an earlier
period, either innocently or to make the document look older. I therefore
searched for a coding technique that was available during the widest
possible range of origin dates--between 1470 and 1608.

 A promising possibility was the Cardan grille, which was introduced by
Italian mathematician Girolamo Cardano in 1550. It consists of a card with
slots cut in it. When the grille is laid over an apparently innocuous text
produced with another copy of the same card, the slots reveal the words of
the hidden message. I realized that a Cardan grille with three slots could
be used to select permutations of prefixes, midfixes and suffixes from a
table to generate Voynichese-style words.

 A typical page of the Voynich manuscript contains about 10 to 40 lines,
each consisting of about eight to 12 words. Using the three-syllable model
of Voynichese, a single table of 36 columns and 40 rows would contain
enough syllables to produce an entire manuscript page with a single grille.
The first column would list prefixes, the second midfixes and the third
suffixes; the following columns would repeat that pattern. You can align
the grille to the upper left corner of the table to create the first word
of Voynichese and then move it three columns to the right to make the next
word. Or you can move the grille to a column farther to the right or to a
lower row. By successively positioning the grille over different parts of
the table, you can create hundreds of Voynichese words. And the same table
could then be used with a different grille to make the words of the next
page.

 I drew up three tables by hand, which took two or three hours per table.
Each grille took two or three minutes to cut out. (I made about 10.) After
that, I could generate text as fast as I could transcribe it. In all, I
produced between 1,000 and 2,000 words this way.

 I found that this method could easily reproduce most of the features of
Voynichese. For example, you can ensure that some characters never occur
together by carefully designing the tables and grilles. If successive
grille slots are always on different rows, then the syllables in
horizontally adjacent cells in the table will never occur together, even
though they may be very common individually. The binomial distribution of
word lengths can be generated by mixing short, medium-length and long
syllables in the table. Another characteristic of Voynichese--that the
first words in a line tend to be longer than later ones--can be reproduced
simply by putting most of the longer syllables on the left side of the
table.

 The Cardan grille method therefore appears to be a mechanism by which the
Voynich manuscript could have been created. My reconstructions suggest that
one person could have produced the manuscript, including the illustrations,
in just three or four months. But a crucial question remains: Does the
manuscript contain only meaningless gibberish or a coded message?

 I found two ways to employ the grilles and tables to encode and decode
plaintext. The first was a substitution cipher that converted plaintext
characters to midfix syllables that are then embedded within meaningless
prefixes and suffixes using the method described above. The second encoding
technique assigned a number to each plaintext character and then used these
numbers to specify the placement of the Cardan grille on the table. Both
techniques, however, produce scripts with much less repetition of words
than Voynichese. This finding indicates that if the Cardan grille was
indeed used to make the Voynich manuscript, the author was probably
creating cleverly designed nonsense rather than a ciphertext. I found no
evidence that the manuscript contains a coded message.

 This absence of evidence does not prove that the manuscript was a hoax,
but my work shows that the construction of a hoax as complex as the Voynich
manuscript was indeed feasible. This explanation dovetails with several
intriguing historical facts: Elizabethan scholar John Dee and his
disreputable associate Edward Kelley visited the court of Rudolf II during
the 1580s. Kelley was a notorious forger, mystic and alchemist who was
familiar with Cardan grilles. Some experts on the Voynich manuscript have
long suspected that Kelley was the author.

 My undergraduate student Laura Aylward is currently investigating whether
more complex statistical features of the manuscript can be reproduced using
the Cardan grille technique. Answering this question will require producing
large amounts of text using different table and grille layouts, so we are
writing software to automate the method.

 This study yielded valuable insights into the process of reexamining
difficult problems to determine whether any possible solutions have been
overlooked. A good example of such a problem is the question of what causes
Alzheimer's disease. We plan to examine whether our approach could be used
to reevaluate previous research into this brain disorder. Our questions
will include: Have the investigators neglected any field of relevant
expertise? Have the key assumptions been tested sufficiently? And are there
subtle misunderstandings between the different disciplines that are
involved in this work? If we can use this process to help Alzheimer's
researchers find promising new directions, then a medieval manuscript that
looks like an alchemist's handbook may actually prove to be a boon to
modern medicine.


GORDON RUGG became interested in the Voynich manuscript about four years
ago. At first he viewed it as merely an intriguing puzzle, but later he saw
it as a test case for reexamining complex problems. He earned his Ph.D. in
psychology at the University of Reading in 1987. Now a senior lecturer in
the School of Computing and Mathematics at Keele University in England,
Rugg is editor in chief of Expert Systems: The International Journal of
Knowledge Engineering and Neural Networks. His research interests include
the nature of expertise and the modeling of information, knowledge and
beliefs.

) 1996-2004 Scientific American, Inc. All rights reserved.
 Reproduction in whole or in part without permission is prohibited.
-- 
-----------------
R. A. Hettinga <mailto: rah at ibuc.com>
The Internet Bearer Underwriting Corporation <http://www.ibuc.com/>
44 Farquhar Street, Boston, MA 02131 USA
"... however it may deserve respect for its usefulness and antiquity,
[predicting the end of the world] has not been found agreeable to
experience." -- Edward Gibbon, 'Decline and Fall of the Roman Empire'





More information about the cypherpunks-legacy mailing list