
On Mon, Oct 12, 2009 at 8:24 PM, John Case <case@sdf.lonestar.org> wrote:
... You discuss the performance benefits of using the AES CTR in hardware below, but before I get to that, I wonder if you experienced the same as what is quoted just above: that the real CPU load occurs during "the asymmetric-key processing (i.e., onion-skin encrypting/decrypting)" and that that is the only area where we can attempt to speed things up with hardware ?
speeding up OpenSSL derived CTR mode with hardware acceleration is useful. (the HardwareAccel and EngineName options above) speeding up Tor with native hardware acceleration and chained / repetitive CTR offload is a much bigger win. as an example the VIA Padlock engine can do a few score MBytes/sec aes 128 with small/single block sizes like those used when HardwareAccel is on and EngineName padlock set. if you could utilize the maximum 16KByte rep instruction mode directly goes well over a gigabyte / second. this is also why the native Solaris API CTR mode gives much better performance than the OpenSSL engine acceleration. speeding up OpenSSL pubkey ops also helps. as of OpenSSL 0.9.9 and 1.0.x you can build OpenSSL with padlock MONTMULT acceleration. still no dynamic engine support for pubkey ops iirc...
So, if I understand correctly, the HardwareAccel option can be turned on by anyone, regardless of OS or hardware platform. It will then (probably) just do AES CTR with OpenSSL (since most people use OpenSSL) and just get no benefit because OpenSSL won't do AES CTR through hardware.
there will be some benefit as Tor wraps a CTR mode around (accelerated) OpenSSL ECB mode. the problem is that this method is limited to single blocks per call, rather than long buffers optimized for crypto offload. you may also get SHA acceleration. the notices log should say, for example: [info] crypto_global_init(): Initializing OpenSSL engine support. [info] crypto_global_init(): Initializing dynamic OpenSSL engine "padlock" acceleration support. [info] crypto_global_init(): Loaded dynamic OpenSSL engine "padlock". [info] crypto_global_init(): Loaded OpenSSL hardware acceleration engine, setting default ciphers. [info] Using default implementation for RSA [info] Using default implementation for DH [info] Using default implementation for RAND [notice] Using OpenSSL engine VIA PadLock: RNG (not used) ACE2 PHE(8192) PMM [padlock] for SHA1 [info] Using default implementation for 3DES [notice] Using OpenSSL engine VIA PadLock: RNG (not used) ACE2 PHE(8192) PMM [padlock] for AES
So, even if I got a BCM5825 working with with the ubsec driver on FreeBSD, and used HardwareAccel, it would just be wasted with OpenSSL.
not necessarily, per above :)
On the other hand, there are Solaris-specific routines (crypto framework APIs (PKCS#11)) built into Solaris that Tor can call instead of OpenSSL, which _will_ do AES CTR in hardware, yielding a huge gain in performance (you mention 25x).
Do I have all of that correct ?
it would also be nice to have these routines native for other engines, like padlock. however this is not a small amount of effort and no one with time and skill has shown an interest yet. (Tor buffer allocation alignment to 16K, inline padlock instr. calls in REP mode, and other changes required.)
- Are there analogues to the "crypto framework APIs (PKCS#11)" in other OS's ? What is this layer called for Linux, for instance ?
there are kernel cryptographic facilities including asm optimized and hardware accelerated crypto primitives. these are usually not utilized from userspace directly. there are PKCS specs for communicating with offload devices in a concise manner, and libs of various types to speak this to myraid hardware. perhaps a longer discussion should be taken off list...
- How does the T2 (Niagara 2) compare to dedicated hardware such as the Sun Crypto 6000 which is currently available ?
the T2 is core acceleration and thus higher throughput for most purposes. the 6000 is useful as a hardened private key store service for your less trusted OS and software to cooperate with.
- Am I correct that if a new version of OpenSSL appeared with AES CTR hardware support, an end user could just proceed blindly with card insertion, driver install, add HardwareAccel=1, and poof! Yes ?
Tor would need to be updated to take advantage of the new CTR mode in the OpenSSL API calls, but then the same HardwareAccel (and AccelName) options would provide a significant improvement over the previous acceleration mode. best regards, *********************************************************************** To unsubscribe, send an e-mail to majordomo@torproject.org with unsubscribe or-talk in the body. http://archives.seul.org/or/talk/ ----- End forwarded message ----- -- Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE