Thanks for all the great comments!  Combining the responses:

I asssume when talking about design proposals, for secure comms, that always Android and iOS devices are used. Are people aware when using such devices, about zero-click exploits, from Pegasus (NSO Group, or FinFisher/FinSpy? I sold my smartphone exactly for that reason and switched to a dumb phone

Yes, I talk about this a bit here: https://gist.github.com/quinthar/44e1c4f63f84556a9822ebf274dc510a#the-feds, but...


exactly right. And the open source OS should be running on non-compromised hardware. Oh, wait.
 
That.  In the real world, we can't all hand build and personally operate our own billion dollar fab to ensure atomic-level security of our entire vertical supply chain.  And even if you could... who's to say the Feds don't sneak in and swap your device with a perfect duplicate when you aren't looking?  Ultimately if you are trying to protect yourself from the combined might of, oh, 8 billion other people, you're going to have a tough time of it.  I'm not building for that use case (nor is anyone else).  I'm building for the billions of people who aren't trying to protect themselves from the Feds, but from other more common (even if more mundane) privacy threats.


https://www.nitrokey.com/news/2020/nitropad-secure-laptop-unique-tamper-detection

How do you know they aren't an NSA front?  Ultimately, you can't.  At some point you've got no choice but to trust someone.


It would make sense to contribute or work with a project like Signal rather than making a new messenger

Well my job is to secure the privacy of Expensify's millions of users, not just shut down shop and tell them to use Signal (especially since Signal doesn't offer the wide range of preaccounting functionality we do).


The only reasonable way to sell something on an app store is to distribute a binary.  Meanwhile with the source available, people can build their own clients, and share them via other channels.

I totally agree, no real world system can grow if it presumes everyone builds their own binaries (and presumably inspects the source code personally to make sure nothing was slipped in via Github when pulling down the repo).  My only point is real-world systems do not exist in a vacuum: the only way to realistically build a secure communication system used by billions is to rely upon trusting _the very people you don't want to monitor you_ to allow you to do so without interference.  It's a harsh, brutal reality, but there it is.

 
I visited expensify.cash but didn't notice an obvious link to the source code.  It can be hard for me to see things, though.

Ah, sorry!  It's a new project so we're mostly focused on curating a small set of dedicated contributors before going too wide.  But you can see it here: https://github.com/Expensify/Expensify.cash  -- Sign up at https://Expensify.cash and enter your Github handle, and we'll invite you to the various testflights and such.  We're also hiring freelancers to help out -- here are a bunch of open jobs, and we're adding more all the time:

https://www.upwork.com/search/jobs/?q=Expensify%20React%20Native


Thank you so much for your open source work.  Please work with existing open source projects when you can both benefit from the work, so the community can grow.

I 100% agree.  We're major sponsors of SQLite, and have also open-sourced our BedrockDB.com, which was running a blockchain in production before it was cool: https://bedrockdb.com/blockchain.html


Here is information on signal's reproducible builds:
https://github.com/signalapp/Signal-Android/tree/master/reproducible-builds
You actually can verify that the app from the play store is the one you have the source to.

Whoa, that's neat!  But doesn't change my point: unless everyone is doing this -- and doing it for every single upgrade -- it doesn't really matter.  This is a neat parlor trick to create a sense of trust, but I think it's a kind of disingenuous performance of security theatre: that's like shining a spotlight on Trump's 20 mile border wall while ignoring that it's a very incomplete protection.  Verifying the APK that you are getting from Google Play never actually makes sense:

1. If the feds *have* identified your device AND you are a sufficiently interesting person of interest that they would force Google to ship you a compromised APK -- they could also just force Google to ship you a compromised, invisibly-installed system update.  Verifying the APK doesn't prove anything.

2. If the feds *have not* identified your device, or you are NOT sufficiently interesting to warrant being compromised, then verifying the APK won't ever find anything.

Whether you are or are not a target, if you are installing Signal via Google Play, you sorta have no choice but to assume that Signal and Google haven't been compromised: if you believe that, then there's no need to verify.  But if you do NOT believe that, then verifying it isn't nearly enough.  (Especially when no amount of securing your own device will secure the device of the person you are talking to -- who if you are a target, is probably also a target, or at least you shouldn't assume they aren't.)


Current public-key systems have severe limitations in message size, so symmetric key systems make the public-key algorithms far more useful.

Hm, but all encryption has severe limitations in message size -- even symmetric encryption works on relatively small block sizes.  This is why the input is split up and something like CBC is used to prevent "known-plaintext" attacks, with each CBC block separately encrypted with the symmetric key.

What I'm trying to figure out is why people don't use something like CBC directly on top of public key encryption.  Again, I can see the performance advantages of using symmetric keys, but I'm not sure if there is any actual security advantage between:

a. Use DH to generate a symmetric key X from my private key (A.priv) and your public key (B.pub), split input into N blocks, CBC encode and encrypt each block separately with that symmetric key 

b. split input into N blocks, CBC encode and encrypt each block separately with your public key (B.pub)


In other words, block-cipher modes are designed to protect against small biases in the ciphertext itself, which is a round-about way of saying you probably don't want to attempt what you are suggesting.

Can you elaborate on what you mean by this?  I'm sorry, I'm not quite following it.


Thank you all for your help, I really appreciate it!

-david


On Sun, Jan 31, 2021 at 2:26 PM Lee Clagett <forum@leeclagett.com> wrote:
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, January 29, 2021 2:42 PM, David Barrett <dbarrett@expensify.com> wrote:

> Wow, these are (mostly) great responses, and exactly what I was looking for.  Thank you!  To call out a couple responses:
>
> > 6, the ratchet protocol produces a hash of previous messages that provides for detection of dropped data, among many other things.  pgp does not do this.
>
> It feels like there are easier ways to detect dropped/tampered message, such as with an a simple accumulated hash of all past messages (or even a CBC mode).  We do this with https://bedrockdb.com/blockchain.html and it works great.  However, I get your point that the double ratchet provides other benefits beyond just forward secrecy.
>
> > Decryption of destroyed messages is a big thing that signal deters. Journalists can get seriously physically injured when that happens.
>
> Yes, I agree, it seems that forward secrecy is both 1) very valuable, 2) very hard to do, and 3) Signal's primary design goal.
>
> > Re Signal and Javascript, Signal offers its code in a signed binary, and offers the source to that binary for anybody to build and check.
>
> Signal offers source, but given that it's distributing binaries via app stores, there's really no way to guarantee that the binary matches that source code.  Open source is great (Expensify.cash is as well), but still requires that you trust the party giving you the binaries.  
>
> > They [Signal] have an automated system that gives their donated money to people who contribute improvements.
>
> Wait really?  I'm not really finding that mentioned anywhere; can you link me to this?  The FAQ doesn't really mention it, but it seems like this would be front and center: https://support.signal.org/hc/en-us/articles/360007319831-How-can-I-contribute-to-Signal-
>
> > Arguably the simplest method is to do what you describe [encrypting every message with the recipient's public key]. However, public-key crypto produces a shared-number of ~256-4096 bits. If the message is longer than this, these shared-secret bits must be "stretched" without revealing the secret. This is why (nearly all) public-key crypto systems are paired with some symmetric cipher.
>
> I'd really love to learn more about this, as I think I almost understand it, but not quite.  Can you elaborate on what you mean by "If the message is longer than this, these shared-secret bits must be "stretched" without revealing the secret."

All public-key crypto systems (afaik) use some fixed-sized mathematical
group. The size of the group, and the actual math operations involved
differ depending on the system (RSA, DH, ECDH, ElGamal, etc). You cannot
send messages of arbitrary size using these systems, because all
operations involve a fixed number of elements (or bits). As an example,
[0-11) is a mathematical field (and therefore group) consisting of 10
elements. Sending an arbitrary message requires a mapping function
to/from these 10 elements. The mapping function must be isomorphic - the
"inverse" is the tricky part because you cannot have a ciphertext that
maps to 10,000 equally possible plaintexts. Even with fairly large
groups, the number of possible plaintexts is typically far too large
(usually infinite). Or very precisely, you can definitely develop an
ElGamal system where plaintext messages map to a group element, but the
number of possible messages must be limited to the size of the group
(again, afaik).

If messages are constrained to "secret keys", then public-key crypto
systems can be used to "pass" or share secret bits. The next problem is
ensuring that these secret bits are not accidentally leaked when
encrypting messages. Symmetric ciphers are designed to prevent this type
of leakage ("confusion"), which is probably why every system is a hybrid
of public-key and symmetric crypto.

> I get that any encryption (symmetric or otherwise) works on blocks, so to encrypt anything larger than one block requires splitting the input up into many blocks.  And I get that there are concerns with accidentally revealing information by encrypting the same exact block back to back (ie, it reveals "the same block appeared twice", without revealing the actual block content itself).  (More on all that here: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Confidentiality_only_modes)
>
> But I'm not quite understanding why you suggest that you couldn't just use a CBC strategy (where each block is derived from the block that preceded it) in conjunction with public key encryption to just encrypt every block with the recipient's public key, eliminating the need for the shared symmetric encryption key.

One obvious problem is with known-plaintext attacks. If any bits are
known in advance to an attacker - even if its "frame" data (JSON, etc.)
- then this information is propagated to later blocks. There must be
confusion+diffusion at every stage. In all major cipher block-modes,
previous ciphertexts are only used if the result is put back through the
cipher. In other words, block-cipher modes are designed to protect
against small biases in the ciphertext itself, which is a round-about
way of saying you probably don't want to attempt what you are
suggesting.

A public-key operation for every block is needed, or a hash function or
cipher is needed. And using only a hash function is iffy because of the
compression stage; (probably) only Keccak/SHA3 is recommended if the PRF
is used directly (the authors have proposed this function for use as a
stream-cipher).

> Now, understand the performance advantages of symmetric over asymmetric encryption, and certainly the convenience (and bandwidth) advantages of having multiple parties all use the same key (ie, to avoid re-encrypting the same message separately for each recipient).  But I don't see any actual security advantage to introducing the symmetric key (and arguably a disadvantage given the increased complexity it adds). 

Current public-key systems have severe limitations in message size, so
symmetric key systems make the public-key algorithms far more useful.

> Thanks for all these answers, I really appreciate them!
> -david 
>
>  
>
> On Tue, Jan 26, 2021 at 12:17 AM <jamesd@echeque.com> wrote:
>
> > On 2021-01-26 04:31, David Barrett wrote:
> > > Yes, this does assume a central keyserver -- and I agree, it's possible
> > > that it lies to you, establishing a connection with someone other than who
> > > you requested (or even a man-in-the-middle).  I don't know how to really
> > > solve that for real without some out-of-band confirmation that the
> > > public key returned by the keyserver (whether centralized or distributed)
> > > matches the public key of the person you want to talk to.
> >
> > Jitsi's solution works.
> >
> > It is the much studied reliable broadcast problem, which is a hard but
> > much studied problem, with a bunch of hard to understand solutions,
> > which nonetheless work.
> >
> > > I think you are saying that performance isn't a real world concern, but
> > > forward secrecy is?  If so, that makes sense.
> >
> > Yes.
> >
> > Ristretto25519 shared secret construction (using asymmetric cryptography
> > to construct a shared secret that is then used in symmetric
> > cryptography) takes 2.5 microseconds on my computer running unoptimized
> > debug code.  For forward secrecy, you need to construct two secrets, one
> > from the transient key and one from some mingling of the permanent key
> > with the transient key, which takes 5 microseconds.
> >
> > And you then use the authenticated shared secret for the rest of the
> > session.

Lee