I'll give the right answers to the right questions

26 Jan 2018

      I just skimmed through Computer Organization and Design RISC-V Edition: The
Hardware Software Interface, so I'll just give you some stream of
consciousness notes on a good CPU

Time Stamp Counter should be virtualized. The white hat counterpart to
Rutkowski's blue pill
no out of order execution, or any other just in time compilation
no speculative execution
implicit parallelization

implement major components of the operating system in hardware, Xen is only
twenty megabytes which is comparable to maximum bitstream size for some FPGA
arguably the only thing that needs to be implemented in silicon is the ALU,
and the bus
each core has multiple threads, even shadow threads invisible to root meant
to handle interupts
to deal with context switch delays, pipeline threads, this may be called a
ready queue?
if a processor can rename registers on the fly, it should be able to load
processes quickly
in fact this may be to some degree done with hyper threading, although
unsure why it is apparently limited to two threads per CPU, although it may
require multiple instruction decoding

buffers, " It seems like the P4 flushes its micro-op cache as part of
handling an interrupt" https://wiki.osdev.org/Context_Switching
although this still wouldn't prevent having two threads per CPU, and at
least one standby thread, particularly if a branch prediction buffer is
removed

eDRAM cache buffer scheduled threads, and to partially virtualize L1 caches
for them. maybe no shared caches, everything must be copied into L1 first.

the best possible compromise for single thread performance is to allow the
OS to flag a thread with a high priority, two threads with unequal priority
on a core will mean the higher priority thread will run until its allotted
cycles run out or there is a stall, creating de facto coarse
multithreading, although this seems unlikely to be needed

apparently dynamic power use has been reducing, so speculative execution is
not as big of a power hog as increasing clock speed

copy on write should be done in a way to avoid any possible race condition,
if multiple processes are aware of data before it is marked copy on write,
it is being done wrong
incoherent data should be avoided
anything in complexity that exceeds the short term memory of the smartest
programmers should be avoided

branch prediction apparently involves flushing instructions halfway
executed?
ideally flushing is constant time to avoid side channels, which would make
it easier as well, although more hardware intensive

overall illogical and detrimentral design decisions have led to the erosion
of computer security.
emphasis on single thread performance neglects modern computing design

obviously branch prediction or any other optimization should be toggled for
cryptography

https://wiki.osdev.org/Context_Switching
There are many ways of performing a context switch. The x86 CPU provides a
way of doing it completely in hardware, but for performance and portability
reasons most modern OS's do

context switches in software.

http://blog.erratasec.com/2015/03/x86-is-high-level-language.html
The upshot is this: Intel's x86 is a high-level language. Coding everything
up according to Agner Fog's instruction timings still won't produce the
predictable, constant-time code you are looking for. There may be some
solutions, like using CMOV, but it will take research.
---
research... hmmm... execute and then check or check then execute?
an interesting meme, research, research, research
research and then implement an idea or implement an idea and then research
it?

Ryan Carboni

g2s

tags

participants (2)