I'll give the right answers to the right questions
I just skimmed through Computer Organization and Design RISC-V Edition: The Hardware Software Interface, so I'll just give you some stream of consciousness notes on a good CPU Time Stamp Counter should be virtualized. The white hat counterpart to Rutkowski's blue pill no out of order execution, or any other just in time compilation no speculative execution implicit parallelization implement major components of the operating system in hardware, Xen is only twenty megabytes which is comparable to maximum bitstream size for some FPGA arguably the only thing that needs to be implemented in silicon is the ALU, and the bus each core has multiple threads, even shadow threads invisible to root meant to handle interupts to deal with context switch delays, pipeline threads, this may be called a ready queue? if a processor can rename registers on the fly, it should be able to load processes quickly in fact this may be to some degree done with hyper threading, although unsure why it is apparently limited to two threads per CPU, although it may require multiple instruction decoding buffers, " It seems like the P4 flushes its micro-op cache as part of handling an interrupt" https://wiki.osdev.org/Context_Switching although this still wouldn't prevent having two threads per CPU, and at least one standby thread, particularly if a branch prediction buffer is removed eDRAM cache buffer scheduled threads, and to partially virtualize L1 caches for them. maybe no shared caches, everything must be copied into L1 first. the best possible compromise for single thread performance is to allow the OS to flag a thread with a high priority, two threads with unequal priority on a core will mean the higher priority thread will run until its allotted cycles run out or there is a stall, creating de facto coarse multithreading, although this seems unlikely to be needed apparently dynamic power use has been reducing, so speculative execution is not as big of a power hog as increasing clock speed copy on write should be done in a way to avoid any possible race condition, if multiple processes are aware of data before it is marked copy on write, it is being done wrong incoherent data should be avoided anything in complexity that exceeds the short term memory of the smartest programmers should be avoided branch prediction apparently involves flushing instructions halfway executed? ideally flushing is constant time to avoid side channels, which would make it easier as well, although more hardware intensive overall illogical and detrimentral design decisions have led to the erosion of computer security. emphasis on single thread performance neglects modern computing design obviously branch prediction or any other optimization should be toggled for cryptography https://wiki.osdev.org/Context_Switching There are many ways of performing a context switch. The x86 CPU provides a way of doing it completely in hardware, but for performance and portability reasons most modern OS's do context switches in software. http://blog.erratasec.com/2015/03/x86-is-high-level-language.html The upshot is this: Intel's x86 is a high-level language. Coding everything up according to Agner Fog's instruction timings still won't produce the predictable, constant-time code you are looking for. There may be some solutions, like using CMOV, but it will take research. --- research... hmmm... execute and then check or check then execute? an interesting meme, research, research, research research and then implement an idea or implement an idea and then research it?
Ps. Don't Cc lists please This is the mail system at host mx1.riseup.net. I'm sorry to have to inform you that your message could not be delivered to one or more recipients. It's attached below. For further assistance, please send mail to postmaster. If you do so, please include this problem report. You can delete your own text from the attached returned message. The mail system <cryptography@metzdowd.com>: host mail1.metzdowd.com[166.84.7.15] said: 554 5.7.1 <cryptography@metzdowd.com>: Recipient address rejected: g2s@riseup.net must be a subscriber to cryptography@metzdowd.com to post to the list. (in reply to RCPT TO command) -------- Original message --------From: Ryan Carboni <ryacko@gmail.com> Date: 1/26/18 5:40 AM (GMT-08:00) To: Crypto <cryptography@metzdowd.com>, cypherpunks@lists.cpunks.org Subject: I'll give the right answers to the right questions I just skimmed through Computer Organization and Design RISC-V Edition: The Hardware Software Interface, so I'll just give you some stream of consciousness notes on a good CPU Time Stamp Counter should be virtualized. The white hat counterpart to Rutkowski's blue pillno out of order execution, or any other just in time compilationno speculative executionimplicit parallelization implement major components of the operating system in hardware, Xen is only twenty megabytes which is comparable to maximum bitstream size for some FPGAarguably the only thing that needs to be implemented in silicon is the ALU, and the buseach core has multiple threads, even shadow threads invisible to root meant to handle interuptsto deal with context switch delays, pipeline threads, this may be called a ready queue?if a processor can rename registers on the fly, it should be able to load processes quicklyin fact this may be to some degree done with hyper threading, although unsure why it is apparently limited to two threads per CPU, although it may require multiple instruction decoding buffers, " It seems like the P4 flushes its micro-op cache as part of handling an interrupt" https://wiki.osdev.org/Context_Switchingalthough this still wouldn't prevent having two threads per CPU, and at least one standby thread, particularly if a branch prediction buffer is removed eDRAM cache buffer scheduled threads, and to partially virtualize L1 caches for them. maybe no shared caches, everything must be copied into L1 first. the best possible compromise for single thread performance is to allow the OS to flag a thread with a high priority, two threads with unequal priority on a core will mean the higher priority thread will run until its allotted cycles run out or there is a stall, creating de facto coarse multithreading, although this seems unlikely to be needed apparently dynamic power use has been reducing, so speculative execution is not as big of a power hog as increasing clock speed copy on write should be done in a way to avoid any possible race condition, if multiple processes are aware of data before it is marked copy on write, it is being done wrongincoherent data should be avoidedanything in complexity that exceeds the short term memory of the smartest programmers should be avoided branch prediction apparently involves flushing instructions halfway executed?ideally flushing is constant time to avoid side channels, which would make it easier as well, although more hardware intensive overall illogical and detrimentral design decisions have led to the erosion of computer security.emphasis on single thread performance neglects modern computing design obviously branch prediction or any other optimization should be toggled for cryptography https://wiki.osdev.org/Context_SwitchingThere are many ways of performing a context switch. The x86 CPU provides a way of doing it completely in hardware, but for performance and portability reasons most modern OS's do context switches in software. http://blog.erratasec.com/2015/03/x86-is-high-level-language.html The upshot is this: Intel's x86 is a high-level language. Coding everything up according to Agner Fog's instruction timings still won't produce the predictable, constant-time code you are looking for. There may be some solutions, like using CMOV, but it will take research. --- research... hmmm... execute and then check or check then execute? an interesting meme, research, research, researchresearch and then implement an idea or implement an idea and then research it?
participants (2)
-
g2s
-
Ryan Carboni