codevoid.de/1/hn/comments_46329530.gph

  URI:

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   Performance Hints â Jeff Dean and Sanjay Ghemawat
       
       
        barfoure wrote 9 min ago:
        Some of this can be reduced to a trivial form, which is to say
        practiced in reality on a reasonable scale, by getting your hands on a
        microcontroller. Not RTOS or Linux or any of that, but just a
        microcontroller without an OS, and learning it and learning its
        internal fetching architecture and getting comfortable with timings,
        and seeing how the latency numbers go up when you introduce external
        memory such as SD Cards and the like. Knowing to read the assembly
        printout and see how the instruction cycles add up in the pipeline is
        also good, because at least you know what is happening. It will then
        make it much easier to apply the same careful mentality to this which
        is ultimately what this whole optimization game is about - optimizing
        where time is spent with what data. Otherwise, someone telling you
        so-and-so takes nanoseconds or microseconds will be alien to you
        because you wouldnât normally be exposed to an environment where you
        regularly count in clock cycles. So consider this a learning
        opportunity.
       
          simonask wrote 4 min ago:
          Just be careful not to blindly apply the same techniques to a mobile
          or desktop class CPU or above.
          
          A lot of code can be pessimized by golfing instruction counts,
          hurting instruction-level parallelism and microcode optimizations by
          introducing false data dependencies.
          
          Compilers outperform humans here almost all the time.
       
        xnx wrote 15 min ago:
        This formatting is more intuitive to me.
        
          L1 cache reference               2,000,000,000 ops/sec
          L2 cache reference               333,333,333 ops/sec
          Branch mispredict               200,000,000 ops/sec
          Mutex lock/unlock (uncontended)      66,666,667 ops/sec
          Main memory reference            20,000,000 ops/sec
          Compress 1K bytes with Snappy        1,000,000 ops/sec
          Read 4KB from SSD               50,000 ops/sec
          Round trip within same datacenter    20,000 ops/sec
          Read 1MB sequentially from memory    15,625 ops/sec
          Read 1MB over 100 Gbps network       10,000 ops/sec
          Read 1MB from SSD               1,000 ops/sec
          Disk seek                   200 ops/sec
          Read 1MB sequentially from disk      100 ops/sec
          Send packet CA->Netherlands->CA      7 ops/sec
       
          barfoure wrote 3 min ago:
          The reason why that formatting is not used is because itâs not
          useful nor true.
       
       
   DIR <- back to front page