Phonotactic Reconstruction of Encrypted VoIP Conversations:

Eugen Leitl eugen at leitl.org
Fri May 27 06:45:24 PDT 2011


http://www.cs.unc.edu/~amw/resources/hooktonfoniks.pdf

Phonotactic Reconstruction of Encrypted VoIP Conversations: Hookt on fon-iks

Andrew M. Whiteb      Austin R. Matthewsbb       Kevin Z. Snowb      Fabian
Monroseb

bDepartment of Computer Science       b Department of Linguistics

University of North Carolina at Chapel Hill

Chapel Hill, North Carolina

{ amw, kzsnow, fabian } @cs.unc.edu, armatthe at email.unc.edu

Abstract

In this work, we unveil new privacy threats against Voice-over-IP  (VoIP)
communications.  Although  prior  work has shown that the interaction of
variable bit-rate codecs and length-preserving  stream  ciphers  leaks
information,  we  show that the threat is more serious than previously
thought. In par- ticular, we derive approximate transcripts of encrypted VoIP
conversations  by  segmenting  an  observed  packet  stream  into
subsequences representing individual phonemes and classifying those
subsequences by the phonemes they encode. Drawing on insights from the
computational linguistics and speech recog- nition communities, we apply
novel techniques for unmasking parts  of  the  conversation.  We  believe our
ability  to  do  so underscores the importance of designing secure (yet
efo,cient) ways to protect the cono,dentiality of VoIP conversations.

...

VII.  CONCLUSION

In this paper, we explore the ability of an adversary to reconstruct  parts
of  encrypted  VoIP  conversations.  Specif- ically,  we  propose  an
approach  for  outputting  a  hypoth- esized  transcript  of  a
conversation,  based  on  segmenting the  sequence  of  observed  packets
sizes  into  subsequences corresponding  to  the  likely  phonemes  they
encode.  These phoneme  sequences  are  then  mapped  to  candidate  words,
after which we incorporate word and part-of-speech based language models to
choose the best candidates using contex- tual information from the
hypothesized sentence as a whole.  Our results show that the quality of the
recovered transcripts is far better in many cases than one would expect.
While the generalized performance is not as strong as we would have liked,
we  believe  the  results  still  raise  cause  for  concern: in particular,
one would hope that such recovery would not be at all possible since VoIP
audio is encrypted precisely to prevent such breaches of privacy. It is our
belief that with advances in computational linguistics, reconstructions of
the type presented here will only improve. Our hope is that this work
stimulates  discussion  within  the  broader  community on ways to design
more secure, yet efo,cient, techniques for preserving the cono,dentiality of
VoIP conversations.





More information about the cypherpunks-legacy mailing list