What advantage does Signal protocol have over basic public key encryption?

Mon Feb 1 12:56:31 PST 2021

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, January 31, 2021 7:28 PM, David Barrett <dbarrett at expensify.com> wrote:

> Thanks for all the great comments!  Combining the responses:
>
> > I asssume when talking about design proposals, for secure comms, that always Android and iOS devices are used. Are people aware when using such devices, about zero-click exploits, from Pegasus (NSO Group, or FinFisher/FinSpy? I sold my smartphone exactly for that reason and switched to a dumb phone
>
> Yes, I talk about this a bit here: https://gist.github.com/quinthar/44e1c4f63f84556a9822ebf274dc510a#the-feds, but...
>
> > exactly right. And the open source OS should be running on non-compromised hardware. Oh, wait.
>
>  
> That.  In the real world, we can't all hand build and personally operate our own billion dollar fab to ensure atomic-level security of our entire vertical supply chain.  And even if you could... who's to say the Feds don't sneak in and swap your device with a perfect duplicate when you aren't looking?  Ultimately if you are trying to protect yourself from the combined might of, oh, 8 billion other people, you're going to have a tough time of it.  I'm not building for that use case (nor is anyone else).  I'm building for the billions of people who aren't trying to protect themselves from the Feds, but from other more common (even if more mundane) privacy threats.
>
> > https://www.nitrokey.com/news/2020/nitropad-secure-laptop-unique-tamper-detection
>
> How do you know they aren't an NSA front?  Ultimately, you can't.  At some point you've got no choice but to trust someone.
>
> > It would make sense to contribute or work with a project like Signal rather than making a new messenger
>
> Well my job is to secure the privacy of Expensify's millions of users, not just shut down shop and tell them to use Signal (especially since Signal doesn't offer the wide range of preaccounting functionality we do).
>
> > The only reasonable way to sell something on an app store is to distribute a binary.  Meanwhile with the source available, people can build their own clients, and share them via other channels.
>
> I totally agree, no real world system can grow if it presumes everyone builds their own binaries (and presumably inspects the source code personally to make sure nothing was slipped in via Github when pulling down the repo).  My only point is real-world systems do not exist in a vacuum: the only way to realistically build a secure communication system used by billions is to rely upon trusting _the very people you don't want to monitor you_ to allow you to do so without interference.  It's a harsh, brutal reality, but there it is.
>
>  
>
> > I visited expensify.cash but didn't notice an obvious link to the source code.  It can be hard for me to see things, though.
>
> Ah, sorry!  It's a new project so we're mostly focused on curating a small set of dedicated contributors before going too wide.  But you can see it here: https://github.com/Expensify/Expensify.cash  -- Sign up at https://Expensify.cash and enter your Github handle, and we'll invite you to the various testflights and such.  We're also hiring freelancers to help out -- here are a bunch of open jobs, and we're adding more all the time:
>
> https://www.upwork.com/search/jobs/?q=Expensify%20React%20Native
>
> > Thank you so much for your open source work.  Please work with existing open source projects when you can both benefit from the work, so the community can grow.
>
> I 100% agree.  We're major sponsors of SQLite, and have also open-sourced our BedrockDB.com, which was running a blockchain in production before it was cool: https://bedrockdb.com/blockchain.html
>
> > Here is information on signal's reproducible builds:
> > https://github.com/signalapp/Signal-Android/tree/master/reproducible-builds
> > You actually can verify that the app from the play store is the one you have the source to.
>
> Whoa, that's neat!  But doesn't change my point: unless everyone is doing this -- and doing it for every single upgrade -- it doesn't really matter.  This is a neat parlor trick to create a sense of trust, but I think it's a kind of disingenuous performance of security theatre: that's like shining a spotlight on Trump's 20 mile border wall while ignoring that it's a very incomplete protection.  Verifying the APK that you are getting from Google Play never actually makes sense:
>
> 1. If the feds *have* identified your device AND you are a sufficiently interesting person of interest that they would force Google to ship you a compromised APK -- they could also just force Google to ship you a compromised, invisibly-installed system update.  Verifying the APK doesn't prove anything.
>
> 2. If the feds *have not* identified your device, or you are NOT sufficiently interesting to warrant being compromised, then verifying the APK won't ever find anything.
>
> Whether you are or are not a target, if you are installing Signal via Google Play, you sorta have no choice but to assume that Signal and Google haven't been compromised: if you believe that, then there's no need to verify.  But if you do NOT believe that, then verifying it isn't nearly enough.  (Especially when no amount of securing your own device will secure the device of the person you are talking to -- who if you are a target, is probably also a target, or at least you shouldn't assume they aren't.)
>
> > Current public-key systems have severe limitations in message size, so symmetric key systems make the public-key algorithms far more useful.
>
> Hm, but all encryption has severe limitations in message size -- even symmetric encryption works on relatively small block sizes.  This is why the input is split up and something like CBC is used to prevent "known-plaintext" attacks, with each CBC block separately encrypted with the symmetric key.

A direct adaption of CBC with public-key crypto would be an ElGamal
construction. This requires two group-elements per block and several
operations per block: (1) one fixed-point DH, (2) one arbitrary-point
DH, (3) one group-element operation, and (4) and some mapping function
to a group-element. Ultimately, you double the message size and have an
explosion of CPU time for this construction. Its certainly doable if the
message size is small. Lookup DH computation time of x25519 for (1)
and (2) for a rough estimation of maximum throughput (ECC will have
highest throughput).

If ElGamal is unacceptabe in performance, then the construction	is
loosely	based on CBC but differs in the	details	significantly. There
are several ways this can be accomplished, and a whole lot of ways it
can go wrong. I've gone	through	several	possible construction techniques
in my head, and	they all require doubling the message-size, or at least
a hash function. Maybe you can come up with something else, but its
definitely not easy.

> What I'm trying to figure out is why people don't use something like CBC directly on top of public key encryption.  Again, I can see the performance advantages of using symmetric keys, but I'm not sure if there is any actual security advantage between:
>
> a. Use DH to generate a symmetric key X from my private key (A.priv) and your public key (B.pub), split input into N blocks, CBC encode and encrypt each block separately with that symmetric key 
>
> b. split input into N blocks, CBC encode and encrypt each block separately with your public key (B.pub)
>
> > In other words, block-cipher modes are designed to protect against small biases in the ciphertext itself, which is a round-about way of saying you probably don't want to attempt what you are suggesting.
>
> Can you elaborate on what you mean by this?  I'm sorry, I'm not quite following it.

Your current description is too	vague. You'll need to come up with an
actual plan and	some rough numbers. And	its possible that the
construction is	trivially busted in the	end.

> Thank you all for your help, I really appreciate it!
>
> -david
>
> On Sun, Jan 31, 2021 at 2:26 PM Lee Clagett <forum at leeclagett.com> wrote:
>
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Friday, January 29, 2021 2:42 PM, David Barrett <dbarrett at expensify.com> wrote:
> >
> > > Wow, these are (mostly) great responses, and exactly what I was looking for.  Thank you!  To call out a couple responses:
> > >
> > > > 6, the ratchet protocol produces a hash of previous messages that provides for detection of dropped data, among many other things.  pgp does not do this.
> > >
> > > It feels like there are easier ways to detect dropped/tampered message, such as with an a simple accumulated hash of all past messages (or even a CBC mode).  We do this with https://bedrockdb.com/blockchain.html and it works great.  However, I get your point that the double ratchet provides other benefits beyond just forward secrecy.
> > >
> > > > Decryption of destroyed messages is a big thing that signal deters. Journalists can get seriously physically injured when that happens.
> > >
> > > Yes, I agree, it seems that forward secrecy is both 1) very valuable, 2) very hard to do, and 3) Signal's primary design goal.
> > >
> > > > Re Signal and Javascript, Signal offers its code in a signed binary, and offers the source to that binary for anybody to build and check.
> > >
> > > Signal offers source, but given that it's distributing binaries via app stores, there's really no way to guarantee that the binary matches that source code.  Open source is great (Expensify.cash is as well), but still requires that you trust the party giving you the binaries.  
> > >
> > > > They [Signal] have an automated system that gives their donated money to people who contribute improvements.
> > >
> > > Wait really?  I'm not really finding that mentioned anywhere; can you link me to this?  The FAQ doesn't really mention it, but it seems like this would be front and center: https://support.signal.org/hc/en-us/articles/360007319831-How-can-I-contribute-to-Signal-
> > >
> > > > Arguably the simplest method is to do what you describe [encrypting every message with the recipient's public key]. However, public-key crypto produces a shared-number of ~256-4096 bits. If the message is longer than this, these shared-secret bits must be "stretched" without revealing the secret. This is why (nearly all) public-key crypto systems are paired with some symmetric cipher.
> > >
> > > I'd really love to learn more about this, as I think I almost understand it, but not quite.  Can you elaborate on what you mean by "If the message is longer than this, these shared-secret bits must be "stretched" without revealing the secret."
> >
> > All public-key crypto systems (afaik) use some fixed-sized mathematical
> > group. The size of the group, and the actual math operations involved
> > differ depending on the system (RSA, DH, ECDH, ElGamal, etc). You cannot
> > send messages of arbitrary size using these systems, because all
> > operations involve a fixed number of elements (or bits). As an example,
> > [0-11) is a mathematical field (and therefore group) consisting of 10
> > elements. Sending an arbitrary message requires a mapping function
> > to/from these 10 elements. The mapping function must be isomorphic - the
> > "inverse" is the tricky part because you cannot have a ciphertext that
> > maps to 10,000 equally possible plaintexts. Even with fairly large
> > groups, the number of possible plaintexts is typically far too large
> > (usually infinite). Or very precisely, you can definitely develop an
> > ElGamal system where plaintext messages map to a group element, but the
> > number of possible messages must be limited to the size of the group
> > (again, afaik).
> >
> > If messages are constrained to "secret keys", then public-key crypto
> > systems can be used to "pass" or share secret bits. The next problem is
> > ensuring that these secret bits are not accidentally leaked when
> > encrypting messages. Symmetric ciphers are designed to prevent this type
> > of leakage ("confusion"), which is probably why every system is a hybrid
> > of public-key and symmetric crypto.
> >
> > > I get that any encryption (symmetric or otherwise) works on blocks, so to encrypt anything larger than one block requires splitting the input up into many blocks.  And I get that there are concerns with accidentally revealing information by encrypting the same exact block back to back (ie, it reveals "the same block appeared twice", without revealing the actual block content itself).  (More on all that here: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Confidentiality_only_modes)
> > >
> > > But I'm not quite understanding why you suggest that you couldn't just use a CBC strategy (where each block is derived from the block that preceded it) in conjunction with public key encryption to just encrypt every block with the recipient's public key, eliminating the need for the shared symmetric encryption key.
> >
> > One obvious problem is with known-plaintext attacks. If any bits are
> > known in advance to an attacker - even if its "frame" data (JSON, etc.)
> > - then this information is propagated to later blocks. There must be
> > confusion+diffusion at every stage. In all major cipher block-modes,
> > previous ciphertexts are only used if the result is put back through the
> > cipher. In other words, block-cipher modes are designed to protect
> > against small biases in the ciphertext itself, which is a round-about
> > way of saying you probably don't want to attempt what you are
> > suggesting.
> >
> > A public-key operation for every block is needed, or a hash function or
> > cipher is needed. And using only a hash function is iffy because of the
> > compression stage; (probably) only Keccak/SHA3 is recommended if the PRF
> > is used directly (the authors have proposed this function for use as a
> > stream-cipher).
> >
> > > Now, understand the performance advantages of symmetric over asymmetric encryption, and certainly the convenience (and bandwidth) advantages of having multiple parties all use the same key (ie, to avoid re-encrypting the same message separately for each recipient).  But I don't see any actual security advantage to introducing the symmetric key (and arguably a disadvantage given the increased complexity it adds). 
> >
> > Current public-key systems have severe limitations in message size, so
> > symmetric key systems make the public-key algorithms far more useful.
> >
> > > Thanks for all these answers, I really appreciate them!
> > > -david 
> > >
> > >  
> > >
> > > On Tue, Jan 26, 2021 at 12:17 AM <jamesd at echeque.com> wrote:
> > >
> > > > On 2021-01-26 04:31, David Barrett wrote:
> > > > > Yes, this does assume a central keyserver -- and I agree, it's possible
> > > > > that it lies to you, establishing a connection with someone other than who
> > > > > you requested (or even a man-in-the-middle).  I don't know how to really
> > > > > solve that for real without some out-of-band confirmation that the
> > > > > public key returned by the keyserver (whether centralized or distributed)
> > > > > matches the public key of the person you want to talk to.
> > > >
> > > > Jitsi's solution works.
> > > >
> > > > It is the much studied reliable broadcast problem, which is a hard but
> > > > much studied problem, with a bunch of hard to understand solutions,
> > > > which nonetheless work.
> > > >
> > > > > I think you are saying that performance isn't a real world concern, but
> > > > > forward secrecy is?  If so, that makes sense.
> > > >
> > > > Yes.
> > > >
> > > > Ristretto25519 shared secret construction (using asymmetric cryptography
> > > > to construct a shared secret that is then used in symmetric
> > > > cryptography) takes 2.5 microseconds on my computer running unoptimized
> > > > debug code.  For forward secrecy, you need to construct two secrets, one
> > > > from the transient key and one from some mingling of the permanent key
> > > > with the transient key, which takes 5 microseconds.
> > > >
> > > > And you then use the authenticated shared secret for the rest of the
> > > > session.
> >
> > Lee

Lee