Phonotactic Reconstruction of Encrypted VoIP Conversations:
Eugen Leitl
eugen at leitl.org
Fri May 27 06:45:24 PDT 2011
http://www.cs.unc.edu/~amw/resources/hooktonfoniks.pdf
Phonotactic Reconstruction of Encrypted VoIP Conversations: Hookt on fon-iks
Andrew M. Whiteb Austin R. Matthewsbb Kevin Z. Snowb Fabian
Monroseb
bDepartment of Computer Science b Department of Linguistics
University of North Carolina at Chapel Hill
Chapel Hill, North Carolina
{ amw, kzsnow, fabian } @cs.unc.edu, armatthe at email.unc.edu
Abstract
In this work, we unveil new privacy threats against Voice-over-IP (VoIP)
communications. Although prior work has shown that the interaction of
variable bit-rate codecs and length-preserving stream ciphers leaks
information, we show that the threat is more serious than previously
thought. In par- ticular, we derive approximate transcripts of encrypted VoIP
conversations by segmenting an observed packet stream into
subsequences representing individual phonemes and classifying those
subsequences by the phonemes they encode. Drawing on insights from the
computational linguistics and speech recog- nition communities, we apply
novel techniques for unmasking parts of the conversation. We believe our
ability to do so underscores the importance of designing secure (yet
efo,cient) ways to protect the cono,dentiality of VoIP conversations.
...
VII. CONCLUSION
In this paper, we explore the ability of an adversary to reconstruct parts
of encrypted VoIP conversations. Specif- ically, we propose an
approach for outputting a hypoth- esized transcript of a
conversation, based on segmenting the sequence of observed packets
sizes into subsequences corresponding to the likely phonemes they
encode. These phoneme sequences are then mapped to candidate words,
after which we incorporate word and part-of-speech based language models to
choose the best candidates using contex- tual information from the
hypothesized sentence as a whole. Our results show that the quality of the
recovered transcripts is far better in many cases than one would expect.
While the generalized performance is not as strong as we would have liked,
we believe the results still raise cause for concern: in particular,
one would hope that such recovery would not be at all possible since VoIP
audio is encrypted precisely to prevent such breaches of privacy. It is our
belief that with advances in computational linguistics, reconstructions of
the type presented here will only improve. Our hope is that this work
stimulates discussion within the broader community on ways to design
more secure, yet efo,cient, techniques for preserving the cono,dentiality of
VoIP conversations.
More information about the cypherpunks-legacy
mailing list