_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
HTML Visit Hacker News on the Web
COMMENT PAGE FOR:
HTML Performance Hints â Jeff Dean and Sanjay Ghemawat
barfoure wrote 9 min ago:
Some of this can be reduced to a trivial form, which is to say
practiced in reality on a reasonable scale, by getting your hands on a
microcontroller. Not RTOS or Linux or any of that, but just a
microcontroller without an OS, and learning it and learning its
internal fetching architecture and getting comfortable with timings,
and seeing how the latency numbers go up when you introduce external
memory such as SD Cards and the like. Knowing to read the assembly
printout and see how the instruction cycles add up in the pipeline is
also good, because at least you know what is happening. It will then
make it much easier to apply the same careful mentality to this which
is ultimately what this whole optimization game is about - optimizing
where time is spent with what data. Otherwise, someone telling you
so-and-so takes nanoseconds or microseconds will be alien to you
because you wouldnât normally be exposed to an environment where you
regularly count in clock cycles. So consider this a learning
opportunity.
simonask wrote 4 min ago:
Just be careful not to blindly apply the same techniques to a mobile
or desktop class CPU or above.
A lot of code can be pessimized by golfing instruction counts,
hurting instruction-level parallelism and microcode optimizations by
introducing false data dependencies.
Compilers outperform humans here almost all the time.
xnx wrote 15 min ago:
This formatting is more intuitive to me.
L1 cache reference 2,000,000,000 ops/sec
L2 cache reference 333,333,333 ops/sec
Branch mispredict 200,000,000 ops/sec
Mutex lock/unlock (uncontended) 66,666,667 ops/sec
Main memory reference 20,000,000 ops/sec
Compress 1K bytes with Snappy 1,000,000 ops/sec
Read 4KB from SSD 50,000 ops/sec
Round trip within same datacenter 20,000 ops/sec
Read 1MB sequentially from memory 15,625 ops/sec
Read 1MB over 100 Gbps network 10,000 ops/sec
Read 1MB from SSD 1,000 ops/sec
Disk seek 200 ops/sec
Read 1MB sequentially from disk 100 ops/sec
Send packet CA->Netherlands->CA 7 ops/sec
barfoure wrote 3 min ago:
The reason why that formatting is not used is because itâs not
useful nor true.
DIR <- back to front page