URI: 
        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   Python numbers every programmer should know
       
       
        superlopuh wrote 2 hours 38 min ago:
        I'm surprised that the `isinstance()` comparison is with `type() ==
        type` and not `type() is type`, which I would expect to be faster,
        since the `==` implementation tends to have an `isinstance` call
        anyway.
       
          superlopuh wrote 2 hours 37 min ago:
          Also seems like the repo is now private, so I can't open an issue, or
          reproduce the numbers.
       
        cma256 wrote 9 hours 51 min ago:
        Great catalogue. On the topic of msgspec, since pydantic is included it
        may be worth including a bench for de-serializing and serializing from
        a msgspec struct.
       
        iamnotsure wrote 9 hours 56 min ago:
        Exactly wrong.
       
        mopsi wrote 10 hours 45 min ago:
        It is always a good idea to have at least a rough understanding of how
        much operations in your code cost, but sometimes very expensive
        mistakes end up in non-obvious places.
        
        If I have only plain Python installed and a .py file that I want to
        test, then what's the easiest way to get a visualization of the call
        tree (or something similar) and the computational cost of each item?
       
        jiggawatts wrote 11 hours 45 min ago:
        My god, the memory bloat is out of this world compared to platforms
        like the JVM or .NET, let alone C++ or Rust!
       
        rozab wrote 12 hours 6 min ago:
        I wonder why an empty set takes so much more memory than an empty dict
       
        andai wrote 12 hours 59 min ago:
        The one I noticed the most was import openai and import numpy.
        
        They're both about a full second on my old laptop.
        
        I ended up writing my own simple LLM library just so I wouldn't have to
        import OpenAI anymore for my interactive scripts.
        
        (It's just some wrapper functions around the equivalent of a curl
        request, which is honestly basically everything I used the OpenAI
        library for anyway.)
       
          kristianp wrote 1 hour 59 min ago:
          I have noticed how long it takes to import numpy. It made rerunning a
          script noticably sluggish.  Not sure what openai's excuse is, but I
          assume numpy's slowness is loading some native dlls?
       
        gcanyon wrote 13 hours 3 min ago:
        As someone who most often works in a language that is literally orders
        of magnitude slower than this —- and has done so since CPU speeds
        were measured in double-digit megahertz —- I am crying at the notion
        that anything here is measured in nanoseconds
       
        Redoubts wrote 13 hours 36 min ago:
        > Attribute read (obj.x)                  14   ns
        
        note that protobuf attributes are 20-50x worse than this
       
        charlieyu1 wrote 13 hours 43 min ago:
        Surprised that list comprehensions are only 26% faster than for loops.
        It used to feel like 4-5x
       
        pvtmert wrote 14 hours 46 min ago:
        There are lots of discussions about relatedness of these numbers for a
        regular software engineer.
        
        Firstly, I want to start with the fact that the base system is a
        macOS/M4Pro, hence;
        
        - Memory related access is possibly much faster than a x86 server.
        - Disk access is possibly much slower than a x86 server.
        
        *) I took x86 server as the basis as most of the applications run on
        x86 Linux boxes nowadays, although a good amount of fingerprint is also
        on other ARM CPUs.
        
        Although it probably does not change the memory footprint much, the
        libraries loaded and their architecture (ie. being Rosetta or not) will
        change the overall footprint of the process.
        
        As it was mentioned on one of the sibling comments -> Always
        inspect/trace your own workflow/performance before making assumptions.
        It all depends on specific use-cases for higher-level performance
        optimizations.
       
        CmdrKrool wrote 14 hours 51 min ago:
        I'm confused by this:
        
          String operations in Python are fast as well. f-strings are the
        fastest formatting style, while even the slowest style is still
        measured in just nano-seconds.
          
          Concatenation (+)   39.1 ns (25.6M ops/sec)
          f-string          64.9 ns (15.4M ops/sec)
        
        It says f-strings are fastest but the numbers show concatenation taking
        less time? I thought it might be a typo but the bars on the graph
        reflect this too?
       
          Liquid_Fire wrote 1 hour 54 min ago:
          Perhaps it's because in all but the simplest cases, you need 2 or
          more concatenations to achieve the same result as one single
          f-string?
          
            "literal1 " + str(expression) + " literal2"
          
          vs
          
            f"literal1 {expression} literal2"
          
          The only case that would be faster is something like: "foo" +
          str(expression)
       
        nodja wrote 15 hours 2 min ago:
        I think a lot of commenters here are missing the point.
        
        Looking at performance numbers is important regardless if it's python,
        assembly or HDL. If you don't understand why your code is slow you can
        always look at how many cycles things take and learn to understand how
        code works at a deeper level, as you mature as a programmer things will
        become obvious, but going through the learning process and having
        references like these will help you to get there sooner, seeing the
        performance numbers and asking why some things take much longer—or
        sometimes why they take the exact same time—is the perfect
        opportunity to learn.
        
        Early in my python career I had a python script that found duplicate
        files across my disks, the first iteration of the script was extremely
        slow, optimizing the script went through several iterations as I
        learned how to optimize at various levels. None of them required me to
        use C. I just used caching, learned to enumerate all files on disk
        fast, and used sets instead of lists. The end result was that doing
        subsequent runs made my script run in 10 seconds instead of 15 minutes.
        Maybe implementing in C would make it run in 1 second, but if I had
        just assumed my script was slow because of python then I would've spent
        hours doing it in C only to go from 15 minutes to 14 minutes and 51
        seconds.
        
        There's an argument to be made that it would be useful to see C numbers
        next to the python ones, but for the same reason people don't just tell
        you to just use an FPGA instead of using C, it's also rude to say
        python is the wrong tool when often it isn't.
       
        robertclaus wrote 15 hours 18 min ago:
        I liked reading through it from a "is modern Python doing anything
        obviously wrong?" perspective, but strongly disagree anyone should
        "know" these numbers. There's like 5-10 primitives in there that
        everyone should know rough timings for; the rest should be derived with
        big-O algorithm and data structure knowledge.
       
        sireat wrote 15 hours 21 min ago:
        Interesting information but these are not hard numbers.
        
        Surely the 100-char string information of 141 bytes is not correct as
        it would only apply to ASCII 100-char strings.
        
        It would be more useful to know the overhead for unicode strings
        presumably utf-8 encoded. And again I would presume 100-Emoji string
        would take 441 bytes (just a hypothesis) and 100-umlaut chars string
        would take 241bytes.
       
        JBits wrote 15 hours 38 min ago:
        One of the reasons I'm really excited about JAX is that I hope it will
        allow me to write fast Python code without worrying about these
        details.
       
        calmbonsai wrote 15 hours 44 min ago:
        You absolutely do not need to know those absolute numbers--only the
        relative costs of various operations.
        
        Additionally, regardless of the code you can profile the system to
        determine where the "hot spots" are and refactor or call-out to more
        performant (Rust, Go, C) run-times for those workflows where necessary.
       
        sjducb wrote 16 hours 12 min ago:
        It’s missing the time taken to instantiate a class.
        
        I remember refactoring some code to improve readability, then observing
        something that was previously a few microseconds take tens of seconds.
        
        The original code created a large list of lists. Each child list had 4
        fields each field was a different thing, some were ints and one was a
        string.
        
        I created a new class with the names of each field and helper methods
        to process the data. The new code created a list of instances of my
        class. Downstream consumers of the list could look at the class to see
        what data they were getting. Modern Python developers would use a data
        class for this.
        
        The new code was very slow. I’d love it if the author measured the
        time taken to instantiate a class.
       
          smcin wrote 14 hours 16 min ago:
          Instantiating classes is in general not a performance issue in
          Python. Your issue here strongly sounds like you're abusing OO to
          pass a list of instances into every method and downstream call (not
          just the usual reference to self, the instance at hand). Don't do
          that, it shouldn't be necessary. It sounds like you're trying to get
          a poor-man's imitation of classmethods, without identifying and
          refactoring whatever it is that methods might need to access from
          other instances.
          
          Please post your code snippet on StackOverflow ([python] tag) or
          CodeReview.SE so people can help you fix it.
          
          > created a new class with the names of each field and helper methods
          to process the data. The new code created a list of instances of my
          class. Downstream consumers of the list could look at the class to
          see what data they were getting.
       
          lifeisstillgood wrote 16 hours 6 min ago:
          I went to the doctor and I said “It hurts when I do this”
          
          The doctor said, “don’t do that”.
          
          Edit: so yeah a rather snarky reply. Sorry. But it’s worth asking
          why we want to use classes and objects everywhere. Alan Kay is well
          known for saying object orientated is about message passing (mostly
          by Erlang people).
          
          A list of lists (where each list is four different types repeated)
          seems a fine data structure, which can be operated on by external
          functions, and serialised pretty easily. Turning it into classes and
          objects might not be a useful refactoring, I would certainly want to
          learn more before giving the go ahead.
       
            sjducb wrote 6 hours 2 min ago:
            The main reason why is to keep a handle on complexity.
            
            When you’re in a project with a few million lines of code and 10
            years of history it can get confusing.
            
            Your data will have been handled by many different functions before
            it gets to you. If you do this with raw lists then the code gets
            very confusing. In one data structure customer name might be [4]
            and another structure might have it in [9]. Worse someone adds a
            new field in [5] then when two lists get concatenated name moves to
            [10] in downstream code which consumes the concatenated lists.
       
            krior wrote 13 hours 47 min ago:
            I mean it sounds reasonable to me to wrap the data into objects.
            
            customers[3][4]
            
            is a lot less readable than
            
            customers[3].balance
       
              lifeisstillgood wrote 13 hours 13 min ago:
              Absolutely
              
              But hidden in this is the failing of every sql-bridge ever -
              it’s definitely easier for a programmer to read
              customers(3).balance but the trade off now is I have to provide
              class based semantics for all operations - and that tends to hide
              (oh you know, impedance mismatch).
              
              I would far prefer “store the records as plain as we can” and
              add on functions to operate over it (think pandas stores
              basically just ints floats and strings as it is numpy underneath)
              
              (Yes you can store pyobjects somehow but the performance drops
              off a cliff.)
              
              Anyway - keep the storage and data structure as raw and simple as
              possible and write functions to run over it. And move to pandas
              or SQLite pretty quickly :-)
       
        snakepit wrote 16 hours 33 min ago:
        This is helpful. Someone should create a similar benchmark for the
        BEAM. This is also a good reminder to continue working on snakepit [1]
        and snakebridge [2]. Plenty remains before they're suitable for prime
        time. [1]
        
  HTML  [1]: https://hex.pm/packages/snakepit
  HTML  [2]: https://hex.pm/packages/snakebridge
       
        esafak wrote 16 hours 44 min ago:
        The point of the original list was that the numbers were simple enough
        to memorize: [1] Nobody is going to remember any of the numbers on this
        new list.
        
  HTML  [1]: https://gist.github.com/jboner/2841832
       
          mikeckennedy wrote 14 hours 39 min ago:
          That's a fair point @esafak. I updated the article with something
          akin to the doubling chart of numbers in the original article from
          2012.
       
        perrygeo wrote 17 hours 0 min ago:
        > small int (0-256) cached
        
        It's -5 to 256, and these have very tricky behavior for programmers
        that confuse identity and equality.
        
          >>> a = -5
          >>> b = -5
          >>> a is b
          True
          >>> a = -6
          >>> b = -6
          >>> a is b
          False
       
          Tostino wrote 12 hours 20 min ago:
          Java does similar. Confusing for beginners who run into it for the
          first time for sure.
       
        lunixbochs wrote 17 hours 7 min ago:
        I'm confused why they repeatedly call a slots class larger than a
        regular dict class, but don't count the size of the dict
       
        lcnmrn wrote 17 hours 20 min ago:
        LLMs can improve Python code performance. I used it myself on a few
        projects.
       
        belabartok39 wrote 17 hours 24 min ago:
        Hmmmm, there should absolutely be standard deviations for this type of
        work. Also, what is N number of runs? Does it say somewhere?
       
          mikeckennedy wrote 15 hours 36 min ago:
          It is open source, you could just look. :) But here is a summary for
          you. It's not just one run and take the number:
          
          Benchmark Iteration Process
          
          Core Approach:
          
          - Warmup Phase: 100 iterations to prepare the operation (default)
          
          - Timing Runs: 5 repeated runs (default), each executing the
          operation a specified number of times
          
          - Result: Median time per operation across the 5 runs
          
          Iteration Counts by Operation Speed:
          - Very fast ops (arithmetic): 100,000 iterations per run
          
          - Fast ops (dict/list access): 10,000 iterations per run
          
          - Medium ops (list membership): 1,000 iterations per run
          
          - Slower ops (database, file I/O): 1,000-5,000 iterations per run
          
          Quality Controls:
          
          - Garbage collection is disabled during timing to prevent
          interference
          
          - Warmup runs prevent cold-start bias
          
          - Median of 5 runs reduces noise from outliers
          
          - Results are captured to prevent compiler optimization elimination
          
          Total Executions: For a typical benchmark with 1,000 iterations and 5
          repeats, each operation runs 5,100 times (100 warmup + 5×1,000
          timed) before reporting the median result.
       
            belabartok39 wrote 15 hours 9 min ago:
            That answers what N is (why not just say in the article). If you
            are only going to report medians, is there an appendix with further
            statistics such as confidence intervals or standard deviations. For
            serious benchmark, it would be essential to show the spread or
            variability, no?
       
        m3047 wrote 17 hours 37 min ago:
        +1 but I didn't see pack / unpack...
       
        zbentley wrote 17 hours 48 min ago:
        I have some questions and requests for clarification/suspicious
        behavior I noticed after reviewing the results and the benchmark code,
        specifically:
        
        - If slotted attribute reads and regular attribute reads are the same
        latency, I suspect that either the regular class may not have enough
        "bells on" (inheritance/metaprogramming/dunder overriding/etc) to
        defeat simple optimizations that cache away attribute access, thus
        making it equivalent in speed to slotted classes. I know that over time
        slotting will become less of a performance boost, but--and this is just
        my intuition and I may well be wrong--I don't get the impression that
        we're there yet.
        
        - Similarly "read from @property" seems suspiciously fast to me. Even
        with descriptor-protocol awareness in the class lookup cache, the
        overhead of calling a method seems surprisingly similar to the overhead
        of accessing a field. That might be explained away by the fact that
        property descriptors' "get" methods are guaranteed to be the simplest
        and easiest to optimize of all call forms (bound method, guaranteed to
        never be any parameters), and so the overhead of setting up the
        stack/frame/args may be substantially minimized...but that would only
        be true if the property's method body was "return 1" or something very
        fast. The properties tested for these benchmarks, though, are looking
        up other fields on the class, so I'd expect them to be a lot slower
        than field access, not just a little slower ( [1] ).
        
        - On the topic of "access fields of objects"
        (properties/dataclasses/slots/MRO/etc.), benchmarks are really hard to
        interpret--not just these benchmarks, all of them I've seen. That's
        because there are fundamentally two operations involved: resolving a
        field to something that produces data for it, and then accessing the
        data. For example, a @property is in a class's method cache, so
        resolving "instance.propname" is done at the speed of the methcache.
        That might be faster than accessing "instance.attribute" (a field, not
        a @property or other descriptor), depending on the inheritance geometry
        in play, slots, __getattr[ibute]__ overrides, and so on. On the other
        hand, accessing the data at "instance.propname" is going to be a lot
        more expensive for most @properties (because they need to call a
        function, use an argument stack, and usually perform other attribute
        lookups/call other functions/manipulate locals, etc); accessing data at
        "instance.attribute" is going to be fast and constant-time--one or two
        pointer-chases away at most.
        
        - Nitty: why's pickling under file I/O? Those benchmarks aren't timing
        pickle functions that perform IO, they're benchmarking the ser/de
        functionality and thus should be grouped with json/pydantic/friends
        above.
        
        - Asyncio's no spring chicken, but I think a lot of the benchmarks
        listed tell a worse story than necessary, because they don't
        distinguish between coroutines, Tasks, and Futures. Coroutines are
        cheap to have and call, but Tasks and Futures have a little more
        overhead when they're used (even fast CFutures) and a lot more overhead
        to construct since they need a lot more data resources than just a
        generator function (which is kinda what a raw coroutine desugars to,
        but that's not as true as most people think it is...another story for
        another time). Now, "run_until_complete{}" and "gather()" initially
        take their arguments and coerce them into Tasks/Futures--that
        detection, coercion, and construction takes time and consumes a lot of
        overhead. That's good to know (since many people are paying that
        coercion tax unknowingly), but it muddies the boundary between
        "overhead of waiting for an asyncio operation to complete" and
        "overhead of starting an asyncio operation". Either calling the
        lower-level functions that run_until_complete()/gather() use
        internally, or else separating out benchmarks into ones that pass
        Futures/Tasks/regular coroutines might be appropriate.
        
        - Benchmarking "asyncio.sleep(0)" as a means of determining the
        bare-minimum await time of a Python event loop is a bad idea. sleep(0)
        is very special (more details here: [2] ) and not representative. To
        benchmark "time it takes for the event loop to spin once and produce a
        result"/the python equivalent of process.nextTick, it'd be better to
        use low-level loop methods like "call_soon" or defer completion to a
        Task and await that.
        
  HTML  [1]: https://github.com/mikeckennedy/python-numbers-everyone-should...
  HTML  [2]: https://news.ycombinator.com/item?id=46056895
       
        thundergolfer wrote 17 hours 48 min ago:
        A lot of people here are commenting that if you have to care about
        specific latency numbers in Python you should just use another
        language.
        
        I disagree. A lot of important and large codebases were grown and
        maintained in Python (Instagram, Dropbox, OpenAI) and it's damn useful
        to know how to reason your way out of a Python performance problem when
        you inevitably hit one without dropping out into another language,
        which is going to be far more complex.
        
        Python is a very useful tool, and knowing these numbers just makes you
        better at using the tool.
        The author is a Python Software Foundation Fellow. They're great at
        using the tool.
        
        In the common case, a performance problem in Python is not the result
        of hitting the limit of the language but the result of sloppy
        un-performant code, for example unnecessarily calling a function
        O(10_000) times in a hot loop.
        
        I wrote up a more focused "Python latency numbers you should know" as a
        quiz here
        
  HTML  [1]: https://thundergolfer.com/computers-are-fast
       
          TacticalCoder wrote 1 hour 25 min ago:
          > ... a function O(10_000) times in a hot loop
          
          O(10_000) is a really weird notation.
       
            tialaramex wrote 53 min ago:
            Generously we could say they probably mean ~10_000 rather than
            O(10_000)
       
          saagarjha wrote 4 hours 8 min ago:
          I do performance optimization for a system written in Python. Most of
          these numbers are useless to me, because they’re completely
          irrelevant until they become a problem, then I measure them myself.
          If you are writing your code trying to save on method calls, you’re
          not getting any benefit from using the language and probably should
          pick something else.
       
            srean wrote 1 hour 10 min ago:
            It's always a balance.
            
            Good designs do not happen in a vacuum but informed with knowledge
            of at least the outlines of the environment.
            
            One can have a breakfast pursuing an idea -- let me spill some
            sticky milk on the dining table, who cares, I will clean up if it
            becomes a problem later.
            
            Another is, it's not much of an overbearing constraint not to make
            a mess with spilt milk in the first place, maybe it will not be a
            big bother later, but it's not hurting me much now, to be not be
            sloppy, so let me be a little hygienic.
            
            There's a balance between making a mess and cleaning up and not
            making a mess in the first place. The other extreme is to be so
            defensive about the possibility of creating a mess that it
            paralyses progress.
            
            The sweet spot is somewhere between the extremes and having the
            ball-park numbers in the back of one's mind helps with that. It
            informs about the environment.
       
          notepad0x90 wrote 6 hours 49 min ago:
          For some of these, there are alternative modules you can use, so it
          is important to know this. But if it really matters, I would think
          you'd know this already?
          
          For me, it will help with selecting what language is best for a task.
          I think it won't change my view that python is an excellent language
          to prototype in though.
       
          NoteyComplexity wrote 12 hours 12 min ago:
          Agreed, and on top of that:
          
          I think these kind of numbers are everywhere and not just specific to
          Python.
          
          In zig, I sometimes take a brief look to the amount of cpu cycles of
          various operations to avoid the amount of cache misses. While I need
          to aware of the alignment and the size of the data type to debloat a
          data structure. If their logic applies, too bad, I should quit
          programming since all languages have their own latency on certain
          operations we should aware of.
          
          There are reasons to not use Python, but that particular reason is
          not the one.
       
          Scubabear68 wrote 13 hours 38 min ago:
          No.
          
          Python’s issue is that it is incredibly slow in use cases that
          surprise average developers. It is incredibly slow at very basic
          stuff, like calling a function or accessing a dictionary.
          
          If Python didn’t have such an enormous number of popular C and C++
          based libraries it would not be here. It was saved by Numpy etc etc.
       
            HenriTEL wrote 47 min ago:
            22ns for a function call and dictionary key lookup, that's actually
            surprisingly fast.
       
            aragilar wrote 1 hour 48 min ago:
            I'm not sure how Python can be described as "saved" by numpy et
            al., when the numerical Python ecosystem was there near the
            beginning, and the language and ecosystem have co-evolved? Why
            didn't Perl (with PDL), R or Ruby (or even php) succeed in the same
            way?
       
            dnautics wrote 8 hours 57 min ago:
            i hate python but if your bottleneck is that sqlite query,
            optimizing a handful of addition operations is a wash.    thats why
            you need to at least have a feel for these tables
       
          i_am_a_peasant wrote 16 hours 42 min ago:
          our build system is written in python, and i’d like it not to suck
          but still stay in python, so these numbers very much matter.
       
          oofbey wrote 17 hours 38 min ago:
          I think both points are fair. Python is slow - you should avoid it if
          speed is critical, but sometimes you can’t easily avoid it.
          
          I think the list itself is super long winded and not very
          informative. A lot of operations take about the same amount of time. 
          Does it matter that adding two ints is very slightly slower than
          adding two floats? (If you even believe this is true, which I
          don’t.) No. A better summary would say “all of these things take
          about the same amount of time: simple math, function calls, etc.
          these things are much slower: IO.” And in that form the summary is
          pretty obvious.
       
            microtonal wrote 17 hours 14 min ago:
            I think the list itself is super long winded and not very
            informative.
            
            I agree. I have to complement the author for the effort put in.
            However it misses the point of the original Latency numbers every
            programmer should know, which is to build an intuition for making
            good ballpark estimations of the latency of operations and that
            e.g. A is two orders of magnitude more expensive than B.
       
          nutjob2 wrote 17 hours 38 min ago:
          > A lot of important and large codebases were grown and maintained in
          Python
          
          How does this happen? Is it just inertia that cause people to write
          large systems in a essentially type free, interpreted scripting
          language?
       
            IshKebab wrote 13 hours 16 min ago:
            Someone says "let's write a prototype in Python" and someone else
            says "are you sure we shouldn't use a a better language that is
            just as productive but isn't going to lock us into abysmal
            performance down the line?" but everyone else says "nah we don't
            need to worry about performance yet, and anyway it's just a
            prototype - we'll write a proper version when we need to"...
            
            10 years later "ok it's too slow; our options are a) spend $10m
            more on servers, b) spend $5m writing a faster Python runtime
            before giving up later because nobody uses it, c) spend 2 years
            rewriting it and probably failing, during which time we can make no
            new features. a) it is then."
       
              anhner wrote 3 hours 17 min ago:
              If I made an app in python and in 10 years it grows so successful
              that it needs a $10m vertical scale or $5m rewrite, I wouldn't
              even complain.
       
              rented_mule wrote 9 hours 48 min ago:
              What many startups need to succeed is to be able to
              pivot/develop/repeat very quickly to find a product+market that
              makes money. If they don't find that, and most don't, the
              millions you talk about never come due. They also rarely have
              enough developers, so developer productivity in the short term is
              vital to that iteration speed. If that startup turns into Dropbox
              or Instagram, the millions you mention are round-off error on
              many billions. Easy business decision, and startups are first and
              foremost businesses.
              
              Some startups end up in between the two extremes above. I was at
              one of the Python-based ones that ended up in the middle. At $30M
              in annual revenue, Python was handling 100M unique monthly
              visitors on 15 cheap, circa-2010 servers. By the time we hit $1B
              in annual revenue, we had Spark for both heavy batch computation
              and streaming computation tasks, and Java for heavy online
              computational workloads (e.g., online ML inference). There were
              little bits of Scala, Clojure, Haskell, C++, and Rust here and
              there (with well over 1K developers, things creep in over the
              years). 90% of the company's code was still in Python and it
              worked well. Of course there were pain points, but there always
              are. At $1B in annual revenue, there was budget for investments
              to make things better (cleaning up architectural choices that
              hadn't kept up, adding static types to core things, scaling up
              tooling around package management and CI, etc.).
              
              But a key to all this... the product that got to $30M (and
              eventually $1B+) looked nothing like what was pitched to initial
              investors. It was unlikely that enough things could have been
              tried to land on the thing that worked without excellent
              developer productivity early on. Engineering decisions are not
              only about technical concerns, they are also about the business
              itself.
       
              fud101 wrote 10 hours 7 min ago:
              I don't know a better open source language than Python. Java and
              C# are both better (platforms) but they come with that obvious
              corporate catch.
       
              gcanyon wrote 13 hours 10 min ago:
              What language is “just as productive but isn't going to lock us
              into abysmal performance down the line”?
              
              What makes that language not strictly superior to Python?
       
                nazgul17 wrote 10 hours 48 min ago:
                Loose typing makes you really fast at writing code, as long as
                you can keep all the details in your head. Python is great for
                smaller stuff. But crossed some threshold, the lack of a
                mechanism that has your back starts slowing you down.
       
                  gcanyon wrote 8 hours 59 min ago:
                  Sure, my language of choice is more flexible than that: I can
                  type
                  
                     put "test abc999 this" into x
                     add 1 to char 4 to 6 of word 2 of x
                     put x -- puts "test abc1000 this"
                  
                  But I'm still curious -- what's the better language?
       
            wiseowise wrote 14 hours 22 min ago:
            Python has types, now even gradual static typing if you want to go
            further. It's irrelevant whether language is interpreted scripting
            if it solves your problem.
       
            tjwebbnorfolk wrote 14 hours 30 min ago:
            Most large things begin life as small things.
       
            hibikir wrote 16 hours 54 min ago:
            Small startups end up writing code in whatever gets things working
            faster, because  having too large a codebase with too much load is
            a champagne problem.
            
            If I told you that we were going to be running a very large
            payments system, with customers from startups to Amazon, you'd not
            write it in ruby and put the data in MongoDB, and then using its
            oplog as a queue... but that's what Stripe looked like. They even
            hired a compiler team to add type checking to the language, as that
            made far more sense than porting a giant monorepo to something
            else.
       
            oivey wrote 17 hours 29 min ago:
            It’s a nice and productive language. Why is that
            incomprehensible?
       
            xboxnolifes wrote 17 hours 32 min ago:
            It's very simple. Large systems start as small systems.
       
              dragonwriter wrote 15 hours 35 min ago:
              Large systems are often aggregates of small systems, too.
       
            oofbey wrote 17 hours 36 min ago:
            It’s very natural. Python is fantastic for going from 0 to 1
            because it’s easy and forgiving. So lots of projects start with
            it. Especially anything ML focused. And it’s much harder to
            change tools once a project is underway.
       
              passivegains wrote 17 hours 2 min ago:
              this is absolutely true, but there's an additional nuance: yes,
              python is fantastic, yes, it's easy and forgiving, but there are
              other languages like that too.
              ...except there really aren't. other than ruby and maybe go,
              every other popular language sacrifices ease of use for things
              that simply do not matter for the overwhelming majority of
              programs. much of python's popularity doesn't come from being
              easy and forgiving, it's that everything else isn't. for normal
              programming why would we subject ourselves to anything but python
              unless we had no choice?
              
              while I'm on the soapbox I'll give java a special mention: a
              couple years ago I'd have said java was easy even though it's
              tedious and annoying, but I've become reacquainted with it for a
              high school program (python wouldn't work for what they're doing
              and the school's comp sci class already uses java.)
              
              this year we're switching to c++.
       
                zelphirkalt wrote 14 hours 11 min ago:
                Omg, switching to C++ for pupils programming beginners ... "How
                to turn off the most students from computer programming?" 101.
                Really can't get much worse than C++ for beginners.
       
                  nightfly wrote 10 hours 28 min ago:
                  PSU (Oregon) uses C++ as just "c with classes" and ignores
                  the rest of C++ for intro to programming courses. It
                  frustrates people who already use C++ but otherwise works
                  pretty well.
       
                    jgalt212 wrote 23 min ago:
                    C++, The Good Parts
       
                    Izkata wrote 43 min ago:
                    This was how we learned it in an intro class in highschool
                    ages ago, worked pretty well there too.
       
        f311a wrote 17 hours 58 min ago:
        > Strings
           >The rule of thumb for strings is the core string object takes 41
        bytes. Each      additional character is 1 byte.
        
        That's misleading. There are three types of strings in Python (1, 2 and
        4 bytes per character).
        
  HTML  [1]: https://rushter.com/blog/python-strings-and-memory/
       
        Retr0id wrote 18 hours 39 min ago:
        > Numbers are surprisingly large in Python
        
        Makes me wonder if the cpython devs have ever considered v8-like
        NaN-boxing or pointer stuffing.
       
        ewuhic wrote 18 hours 51 min ago:
        This is AI slop.
       
          Lockal wrote 1 hour 55 min ago:
          Sad that your comment is downvoted. But yes, for those who need
          clarification:
          
          1) Measurements are faulty. List of 1,000 ints can be 4x smaller.
          Most time measurements depend on circumstances that are not
          mentioned, therefore can't be reproduced.
          
          2) Brainrot AI style. Hashmap is not "200x faster than list!", that's
          not how complexity works.
          
          3) orjson/ujson are faulty, which is one of the reasons they don't
          replace stdlib implementation. Expect crashes, broken jsons, anything
          from them
          
          4) What actually will be used in number-crunching applications -
          numpy or similar libraries - is not even mentioned.
       
        mikeckennedy wrote 19 hours 3 min ago:
        Author here.
        
        Thanks for the feedback everyone. I appreciate your posting it
        @woodenchair and @aurornis for pointing out the intent of the article.
        
        The idea of the article is NOT to suggest you should shave 0.5ns off by
        choosing some dramatically different algorithm or that you really need
        to optimize the heck out of everything.
        
        In fact, I think a lot of what the numbers show is that over thinking
        the optimizations often isn't worth it (e.g. caching len(coll) into a
        variable rather than calling it over and over is less useful that it
        might seem conceptually).
        
        Just write clean Python code. So much of it is way faster than you
        might have thought.
        
        My goal was only to create a reference to what various operations cost
        to have a mental model.
       
          willseth wrote 17 hours 54 min ago:
          Then you should have written that. Instead you have given more fodder
          for the premature optimization crowd.
       
            mikeckennedy wrote 14 hours 38 min ago:
            I didn't tell anyone to optimize anything. I just posted numbers.
            It's not my fault some people are wired that way. Anytime I
            suggested some sort of recommendation it was to NOT optimize.
            
            For example, from the post "Maybe we don’t have to optimize it
            out of the test condition on a while loop looping 100 times after
            all."
       
              calmbonsai wrote 9 hours 6 min ago:
              The literal title is "Python Numbers Every Programmer Should
              Know" which implies the level of detail in the article (down to
              the values of the numbers) is important.  It is not.
              
              It is helpful to know the relative value (costs) of these
              operations.  Everything else can be profiled and optimized for
              the particular needs of a workflow in a specific architecture.
              
              To use an analogy, turbine designers no longer need to know the
              values in the "steam tables", but they do need to know efficient
              geometries and trade-offs among them when designing any Rankine
              cycle to meet power, torque, and Reynolds regimes.
       
        boerseth wrote 19 hours 12 min ago:
        That's a long list of numbers that seem oddly specific. Apart from
        learning that f-strings are way faster than the alternatives, and
        certain other comparisons, I'm not sure what I would use this for
        day-to-day.
        
        After skimming over all of them, it seems like most "simple" operations
        take on the order of 20ns. I will leave with that rule of thumb in
        mind.
       
          aunderscored wrote 15 hours 36 min ago:
          If you're interested, fstrings are faster because they directly
          become bytecode at compile time rather than being a function call at
          runtime
       
            apelapan wrote 14 hours 12 min ago:
            Thanks for the that bit of info! I was surprised by the speed
            difference. I have always assumed that most variations of basic
            string formatting would compile to the same bytecode.
            
            I usually prefer classic %-formatting for readability when the
            arguments are longer and f-strings when the arguments are shorter.
            Knowing there is a material performance difference at scale, might
            shift the balance in favour of f-strings for some situations.
       
          0x000xca0xfe wrote 18 hours 50 min ago:
          That number isn't very useful either, it really depends on the
          hardware. Most virtualized server CPUs where e.g. Django will run on
          in the end are nowhere near the author's M4 Pro.
          
          Last time I benchmarked a VPS it was about the performance of an Ivy
          Bridge generation laptop.
       
            giantrobot wrote 18 hours 2 min ago:
            > Last time I benchmarked a VPS it was about the performance of an
            Ivy Bridge generation laptop.
            
            I have a number of Intel N95 systems around the house for various
            things. I've found them to be a pretty accurate analog for small
            instances VPSes. The N95 are Intel E-cores which are effectively
            Sandy Bridge/Ivy Bridge cores.
            
            Stuff can fly on my MacBook but than drag on a small VPS instance
            but validating against an N95 (I already have) is helpful. YMMV.
       
        mwkaufma wrote 19 hours 18 min ago:
        Why? If those micro benchmarks mattered in your domain, you wouldn't be
        using python.
       
          PhilipRoman wrote 17 hours 18 min ago:
          ...and other hilarious jokes you can tell yourself!
       
          coldtea wrote 19 hours 7 min ago:
          That's an "all or nothing" fallacy. Just because you use Python and
          are OK with some slowdown, doesn't mean you're OK with each and every
          slowdown when you can do better.
          
          To use a trivial example, using a set instead of a list to check
          membership is a very basic replacement, and can dramatically improve
          your running time in Python. Just because you use Python doesn't mean
          anything goes regarding performance.
       
            mwkaufma wrote 18 hours 47 min ago:
            That's an example of an algorithmic improvement (log n vs n), not a
            micro benchmark, Mr. Fallacy.
       
              coldtea wrote 14 hours 41 min ago:
              "Mr. Fallacy."? Got any better juvenile name-calling?
              
              The case is among the example numbers given in TFA:
              
              "Dict lookup by key", "List membership check"
              
              Does it have to spell out the difference is algorithmic in this
              case for the comparison to be useful?
              
              Or, inversely, is the difference between e.g. memory and disk
              access times insignificant, because it's not algorithmic?
       
        Aurornis wrote 19 hours 22 min ago:
        A meta-note on the title since it looks like it’s confusing a lot of
        commenters: The title is a play on Jeff Dean’s famous “Latency
        Numbers Every Programmer Should Know” from 2012. It isn’t meant to
        be interpreted literally. There’s a common theme in CS papers and
        writing to write titles that play upon themes from past papers. Another
        common example is the “_____ considered harmful” titles.
       
          dekhn wrote 16 hours 16 min ago:
          That doc predates 2012 significantly.
          
          From what I've been able to glean, it was basically created in the
          first few years Jeff worked at Google, on indexing and serving for
          the original search engine.  For example, the comparison of cache,
          RAM, and disk: determined whether data was stored in RAM (the index,
          used for retrieval) or disk (the documents, typically not used in
          retrieval, but used in scoring).  Similarly, the comparison of
          California-Netherlands time- I believe Google's first international
          data cetner was in NL and they needed to make decisions about copying
          over the entire index in bulk versus serving backend queries in the
          US with frontends in the NL.
          
          The numbers were always going  out of date; for example, the arrival
          of flash drives changed disk latency significantly.  I remember Jeff
          came to me one day and said he'd invented a compression algorithm for
          genomic data "so it can be served from flash" (he thought it would be
          wasteful to use precious flash space on uncompressed genomic data).
       
          willseth wrote 17 hours 57 min ago:
          Good callout on the paper reference, but this author gives gives
          every indication that he’s dead serious in the first paragraph. I
          don’t think commenters are confused.
       
          shanemhansen wrote 18 hours 24 min ago:
          Going to write a real banger of a paper called "latency numbers
          considered harmful is all you need" and watch my academic cred go
          through the roof.
       
            AnonymousPlanet wrote 14 hours 57 min ago:
            " ... with an Application to the Entscheidungsproblem"
       
          Kwpolska wrote 18 hours 49 min ago:
          This title only works if the numbers are actually useful. Those are
          not, and there are far too many numbers for this to make sense.
       
            Aurornis wrote 18 hours 35 min ago:
            The title was meant to be taken literally, as in you're supposed to
            memorize all of these numbers. It was meant as an in-joke reference
            to the original writing to signal that this document was going to
            contain timing values for different operations.
            
            I completely understand why it's frustrating or confusing by
            itself, though.
       
        ZiiS wrote 19 hours 43 min ago:
        This is really weird thing to worry about in python. But is also
        misleading; Python int is arbitrary precision, they can take up much
        more storage and arithmetic time depending in their value.
       
        willseth wrote 19 hours 49 min ago:
        Every Python programmer should be thinking about far more important
        things than low level performance minutiae. Great reference but
        practically irrelevant except in rare cases where optimization is
        warranted. If your workload grows to the point where this stuff
        actually matters, great! Until then it’s a distraction.
       
          HendrikHensen wrote 18 hours 28 min ago:
          Having general knowledge about the tools you're working with is not a
          distraction, it's an intellectual enrichment in any case, and can be
          a valuable asset in specific cases.
       
            willseth wrote 18 hours 8 min ago:
            Knowing that an empty string is 41 bytes or how many ns it takes to
            do arithmetic operations is not general knowledge.
       
              oivey wrote 17 hours 19 min ago:
              How is it not general knowledge? How do you otherwise gauge if
              your program is taking a reasonable amount of time, and, if not,
              how do you figure out how to fix it?
       
                dirtbag__dad wrote 11 hours 11 min ago:
                In my experience, which is series A or earlier data intensive
                SaaS, you can gauge whether a program is taking a reasonable
                amount of time just by running it and using your common sense.
                
                P50 latency for a fastapi service’s endpoint is 30+ seconds.
                Your ingestion pipeline, which has a data ops person on your
                team waiting for it to complete, takes more than one business
                day to run.
                
                Your program is obviously unacceptable. And, your problems are
                most likely completely unrelated to these heuristics. You
                either have an inefficient algorithm or more likely you are
                using the wrong tool (ex OLTP for OLAP) or the right tool the
                wrong way (bad relational modeling or an outdated LLM model).
                
                If you are interested in shaving off milliseconds in this
                context then you are wasting your time on the wrong thing.
                
                All that being said, I’m sure that there’s a very good
                reason to know this stuff in the context of some other domains,
                organizations, company size/moment. I suspect these metrics are
                irrelevant to disproportionately more people reading this.
                
                At any rate, for those of us who like to learn, I still found
                this valuable but by no means common knowledge
       
                  oivey wrote 10 hours 33 min ago:
                  I'm not sure it's common knowledge, but it is general
                  knowledge. Not all HNers are writing web apps. Many may be
                  writing truly compute bound applications.
                  
                  In my experience writing computer vision software, people
                  really struggle with the common sense of how fast computers
                  really are. Some knowledge like how many nanoseconds an add
                  takes can be very illuminating to understand whether their
                  algorithm's runtime makes any sense. That may push loose the
                  bit of common sense that their algorithm is somehow wrong.
                  Often I see people fail to put bounds on their expectations.
                  Numbers like these help set those bounds.
       
                    dirtbag__dad wrote 44 min ago:
                    Thanks this is helpful framing!
       
                cycomanic wrote 16 hours 13 min ago:
                But these performance numbers are meaningless without some sort
                of standard comparison case. So if you measure that e.g. some
                string operation takes 100ns, how do you compare against the
                numbers given here? Any difference could be due to PC, python
                version or your implementation. So you have to do proper
                benchmarking anyway.
       
                  ehaliewicz2 wrote 9 hours 53 min ago:
                  If your program does 1 million adds, but it takes
                  significantly longer than 19 milliseconds, you can guess that
                  something else is going on.
       
                willseth wrote 16 hours 27 min ago:
                You gauge with metrics and profiles, if necessary, and address
                as needed. You don’t scrutinize every line of code over
                whether it’s “reasonable” in advance instead of doing
                things that actually move the needle.
       
                  oivey wrote 16 hours 18 min ago:
                  These are the metrics underneath it all. Profiles tell you
                  what parts are slow relative to others and time your specific
                  implementation. How long should it take to sum together a
                  million integers?
       
                    willseth wrote 15 hours 50 min ago:
                    It literally doesn’t matter unless it impacts users. I
                    don’t know why you would waste time on non problems.
       
                      oivey wrote 14 hours 20 min ago:
                      No one is suggesting “wasting time on non problems.”
                      You’re tilting at windmills.
       
                        willseth wrote 12 hours 49 min ago:
                        Read more carefully
       
          kc0bfv wrote 19 hours 22 min ago:
          I agree - however, that has mostly been a feeling for me for years. 
          Things feel fast enough and fine.
          
          This page is a nice reminder of the fact, with numbers.  For a while,
          at least, I will Know, instead of just feel, like I can ignore the
          low level performance minutiae.
       
          amelius wrote 19 hours 45 min ago:
          Yeah, if you hit limits just look for a module that implements the
          thing in C (or write it). This is how it was always done in Python.
       
            ryandrake wrote 17 hours 8 min ago:
            I am currently (as we type actually LOL) doing this exact thing in
            a hobby GIS project: Python got me a prototype and proof of
            concept, but now that I am scaling the data processing to
            worldwide, it is obviously too slow so I'm rewriting it (with LLM
            assistance) in C. The huge benefit of Python is that I have a known
            working (but slow) "reference implementation" to test against. So I
            know the C version works when it produces identical output. If I
            had a known-good Python version of past C, C++, Rust, etc. projects
            I worked on, it would have been most beneficial when it came time
            to test and verify.
       
            willseth wrote 19 hours 24 min ago:
            Sometimes it’s as simple as finding the hotspot with a profiler
            and making a simple change to an algorithm or data structure, just
            like you would do in any language. The amount of handwringing
            people do about building systems with Python is silly.
       
        867-5309 wrote 19 hours 50 min ago:
        tfa mentions running benchmark on a multi-core platform, but doesn't
        mention if benchmark results used multithreading.. a brief look at the
        code suggests not
       
        jchmbrln wrote 19 hours 51 min ago:
        What would be the explanation for an int taking 28 bytes but a list of
        1000 ints taking only 7.87KB?
       
          wiml wrote 18 hours 43 min ago:
          That appears to be the size of the list itself, not including the
          objects it contains: 8 bytes per entry for the object pointer, and a
          kilo-to-kibi conversion. All Python values are "boxed", which is
          probably a more important thing for a Python programmer to know than
          most of these numbers.
          
          The list of floats is larger, despite also being simply an array of
          1000 8-byte pointers. I assume that it's because the int array is
          constructed from a range(), which has a __len__(), and therefore the
          list is allocated to exactly the required size; but the float array
          is constructed from a generator expression and is presumably
          dynamically grown as the generator runs and has a bit of free space
          at the end.
       
            lopuhin wrote 17 hours 22 min ago:
            That's impressive how you figured out the reason for the difference
            in list of floats vs list of ints container size, framed as an
            interview question that would have been quite difficult I think
       
            mikeckennedy wrote 17 hours 40 min ago:
            It was. I updated the results to include the contained elements. I
            also updated the float list creation to match the int list
            creation.
       
        Y_Y wrote 20 hours 7 min ago:
        int is larger than float, but list of floats is larger than list of
        ints
        
        Then again, if you're worried about any of the numbers in this article
        maybe you shouldn't be using Python at all. I joke, but please do at
        least use Numba or Numpy so you aren't paying huge overheads for making
        an object of every little datum.
       
        _ZeD_ wrote 20 hours 8 min ago:
        Yeah... No.
        I've 10+ years of python under my belt and I might have had need for
        this kind of micro optimizations in like 2 times most
       
          willseth wrote 17 hours 40 min ago:
          Sorry, you’re not allowed to discourage premature optimization or
          defend Python here.
       
        riazrizvi wrote 20 hours 14 min ago:
        The titles are oddly worded. For example -
        
          Collection Access and Iteration
          How fast can you get data out of Python’s built-in collections?
        Here is a dramatic example of how much faster the correct data
        structure is. item in set or item in dict is 200x faster than item in
        list for just 1,000 items!
        
        It seems to suggest an iteration for x in mylist is 200x slower than
        for x in myset. It’s the membership test that is much slower. Not the
        iteration. (Also for x in mydict is an iteration over keys not values,
        and so isn’t what we think of as an iteration on a dict’s
        ‘data’).
        
        Also the overall title “Python Numbers Every Programmer Should
        Know” starts with 20 numbers that are merely interesting.
        
        That all said, the formatting is nice and engaging.
       
        dr_kretyn wrote 20 hours 16 min ago:
        Initially I thought how efficient strings are... but then I understood
        how inefficient arithmetic is.
        Interesting comparison but exact speed and IO depend on a lot of
        things, and unlikely one uses Mac mini in production so these numbers
        definitely aren't representative.
       
        oogali wrote 20 hours 17 min ago:
        It's important to know that these numbers will vary based on what
        you're measuring, your hardware architecture, and how your particular
        Python binary was built.
        
        For example, my M4 Max running Python 3.14.2 from Homebrew (built, not
        poured) takes 19.73MB of RAM to launch the REPL (running `python3` at a
        prompt).
        
        The same Python version launched on the same system with a single
        invocation for `time.sleep()`[1] takes 11.70MB.
        
        My Intel Mac running Python 3.14.2 from Homebrew (poured) takes 37.22MB
        of RAM to launch the REPL and 9.48MB for `time.sleep`.
        
        My number for "how much memory it's using" comes from running `ps auxw
        | grep python`, taking the value of the resident set size (RSS column),
        and dividing by 1,024.
        
        1: python3 -c 'from time import sleep; sleep(100)'
       
        xnx wrote 20 hours 20 min ago:
        Python programmers don't need to know 85 different obscure performance
        numbers. Better to really understand ~7 general system performance
        numbers.
       
        zelphirkalt wrote 20 hours 21 min ago:
        I doubt there is much to gain from knowing how much memory an empty
        string takes. The article or the listed numbers have a weird fixation
        on memory usage numbers and concrete time measurements. What is way
        more important to "every programmer" is time and space complexity, in
        order to avoid designing unnecessarily slow or memory hungry programs.
        Under the assumption of using Python, what is the use of knowing that
        your int takes 28 bytes? In the end you will have to determine, whether
        the program you wrote meats the performance criteria you have and if it
        does not, then you need a smarter algorithm or way of dealing with
        data. It helps very little to know that your 2d-array of 1000x1000
        bools is so and so big. What helps is knowing, whether it is too much
        and maybe you should switch to using a large integer and a bitboard
        approach. Or switch language.
       
          kingstnap wrote 16 hours 46 min ago:
          I disagree. Performance is a leaky abstraction that *ALWAYS* matters.
          
          Your cognition of it is either implicit or explicit.
          
          Even if you didn't know for example that list appends was linear and
          not quadratic and fairly fast.
          
          Even if you didn't give a shit if simple programs were for some
          reason 10000x slower than they needed to be because it meets some
          baseline level of good enough / and or you aren't the one impacted by
          the problems inefficacy creates.
          
          Library authors beneath you would still know and the APIs you
          interact with and the pythonic code you see and the code LLMS
          generate will be affected by that leaky abstraction.
          
          If you think that n^2 naive list appends is a bad example its not
          btw, python string appends are n^2 and that has and does affect how
          people do things, f strings for example are lazy.
          
          Similarly a direct consequence of dictionaries being fast in Python
          is that they are used literally everywhere. The old Pycon 2017 talks
          from Raymond talk about this.
          
          Ultimately what the author of the blog has provided is this sort of
          numerical justification for the implicit tacit sort of knowledge
          performance understanding gives.
       
          Qem wrote 20 hours 8 min ago:
          > Under the assumption of using Python, what is the use of knowing
          that your int takes 28 bytes?
          
          Relevant if your problem demands instatiation of a large number of
          objects. This reminds me of a post where Eric Raymond discusses the
          problems he faced while trying to use Reposurgeon to migrate GCC. See
          
  HTML    [1]: http://esr.ibiblio.org/?p=8161
       
        fooker wrote 20 hours 21 min ago:
        Counterintuitively: program in python only if you can get away without
        knowing these numbers.
        
        When this starts to matter, python stops being the right tool for the
        job.
       
          bathtub365 wrote 16 hours 44 min ago:
          These basically seem like numbers of last resort. After you’ve
          profiled and ruled out all of the usual culprits (big disk reads,
          network latency, polynomial or exponential time algorithms, wasteful
          overbuilt data structures, etc) and need to optimize at the level of
          individual operations.
       
          Quothling wrote 18 hours 20 min ago:
          Why? I've build some massive analytic data flows in Python with
          turbodbc + pandas which are basically C++ fast. It uses more memory
          which supports your point, but on the flip-side we're talking $5-10
          extra cost a year. It could frankly be $20k a year and still be
          cheaper than staffing more people like me to maintain these things,
          rather than having a couple of us and then letting the BI people use
          the tools we provide for them. Similarily when we do embeded work,
          micro-python is just so much easier to deal with for our engineering
          staff.
          
          The interoperability between C and Python makes it great, and you
          need to know these numbers on Python to know when to actually build
          something in C. With Zig getting really great interoperability,
          things are looking better than ever.
          
          Not that you're wrong as such. I wouldn't use Python to run an
          airplane, but I really don't see why you wouldn't care about the
          resources just because you're working with an interpreted or GC
          language.
       
            its-summertime wrote 17 hours 2 min ago:
            From the complete opposite side, I've built some tiny bits of near
            irrelevant code where python has been unacceptable, e.g. in shell
            startup / in bash's PROMPT_COMMAND, etc. It ends up having a very
            painfully obvious startup time, even if the code is nearing the
            equivalent of Hello World
            
                time python -I -c 'print("Hello World")'
                real    0m0.014s
                time bash --noprofile -c 'echo "Hello World"'
                real    0m0.001s
       
              dekhn wrote 16 hours 21 min ago:
              What exactly do you need 1ms instead of 14ms startup time in a
              shell startup?
              The difference is barely perceptible.
              
              Most of the time starting up is time spent seartching the
              filesystem for thousands of packages.
       
                NekkoDroid wrote 15 hours 50 min ago:
                > What exactly do you need 1ms instead of 14ms startup time in
                a shell startup?
                
                I think as they said: when dynamically building a shell input
                prompt it starts to become very noticable if you have like 3 or
                more of these and you use the terminal a lot.
       
                  dekhn wrote 13 hours 21 min ago:
                  Ah, I only noticed the "shell startup" bit.
                  
                  Yes, after 2-3 I agree you'd start to notice if you were
                  really fast.  I suppose at that point I'd just have Gemini
                  rewrite the prompt-building commands in Rust (it's quite good
                  at that) or merge all the prompt-building commands into a
                  single one (to amortize the startup cost).
       
                    its-summertime wrote 2 hours 1 min ago:
                     [1] perhaps? I should probably start using it again
                    honestly.
                    
  HTML              [1]: https://starship.rs/
       
            fooker wrote 18 hours 4 min ago:
            > you need to know these numbers on Python to know when to actually
            build something in C
            
            People usually approach this the other way, use something like
            pandas or numpy from the beginning if it solves your problem. Do
            not write matrix multiplications or joins in python at all.
            
            If there is no library that solves your problem, it's a great
            indication that you should avoid python. Unless you are willing to
            spend 5 man-years writing a C or C++ library with good python
            interop.
       
              oivey wrote 17 hours 22 min ago:
              People generally aren’t rolling their own matmuls or joins or
              whatever in production code. There are tons of tools like Numba,
              Jax, Triton, etc that you can use to write very fast code for
              new, novel, and unsolved problems. The idea that “if you need
              fast code, don’t write Python” has been totally obsolete for
              over a decade.
       
                fooker wrote 17 hours 3 min ago:
                Yes, that's what I said.
                
                If you are writing performance sensitive code that is not
                covered by a popular Python library, don't do it unless you are
                a megacorp that can put a team to write and maintain a library.
       
                  oivey wrote 16 hours 57 min ago:
                  It isn’t what you said. If you want, you can write your own
                  matmul in Numba and it will be roughly as fast as similar C
                  code. You shouldn’t, of course, for the same reason
                  handrolling your own matmuls in C is stupid.
                  
                  Many problems can performantly solved in pure Python,
                  especially via the growing set of tools like the JIT
                  libraries I cited. Even more will be solvable when things
                  like free threaded Python land. It will be a minority of
                  problems that can’t be, if it isn’t already.
       
          Demiurge wrote 18 hours 26 min ago:
          I agree. I've been living off Python for 20 years and have never
          needed to know any of these numbers, nor do I need them now, for my
          work, contrary to the title. I also regularly use profiling for
          performance optimization and opt for Cython, SWIG, JIT libraries, or
          other tools as needed. None of these numbers would ever factor into
          my decision-making.
       
            AtlasBarfed wrote 16 hours 45 min ago:
            .....
            
            You don't see any value in knowing that numbers?
       
              Demiurge wrote 11 hours 32 min ago:
              That's what I just said. There is zero value to me knowing these
              numbers. I assume that all python built in methods are pretty
              much the same speed. I concentrate on IO being slow, minimizing
              these operations. I think about CPU intensive loops that process
              large data, and I try to use libraries like numpy, DuckDB, or
              other tools to do the processing. If I have a more complicated
              system, I profile its methods, and optimize tight loops based on
              PROFILING. I don't care what the numbers in the article are,
              because I PROFILE, and I optimize the procedures that are the
              slowest, for example, using cython. Which part of what I am
              saying does not make sense?
       
                KeplerBoy wrote 1 hour 21 min ago:
                That makes perfect sense. Especially since those numbers can
                change with new python versions.
       
              TuringTest wrote 15 hours 57 min ago:
              As others have pointed out, Python is better used in places where
              those numbers aren't relevant.
              
              If they start becoming relevant, it's usually a sign that you're
              using the language in a domain where a duck-typed bytecode
              scripting-glue language is not well-suited.
       
          libraryofbabel wrote 20 hours 6 min ago:
          Or keep your Python scaffolding, but push the performance-critical
          bits down into a C or Rust extension, like numpy, pandas, PyTorch and
          the rest all do.
          
          But I agree with the spirit of what you wrote - these numbers are
          interesting but aren’t worth memorizing. Instead, instrument your
          code in production to see where it’s slow in the real world with
          real user data (premature optimization is the root of all evil etc),
          profile your code (with pyspy, it’s the best tool for this if
          you’re looking for cpu-hogging code), and if you find yourself
          worrying about how long it takes to add something to a list in Python
          you really shouldn’t be doing that operation in Python at all.
       
            eichin wrote 19 hours 20 min ago:
            "if you're not measuring, you're not optimizing"
       
          MontyCarloHall wrote 20 hours 17 min ago:
          Exactly. If you're working on an application where these numbers
          matter, Python is far too high-level a language to actually be able
          to optimize them.
       
        tgv wrote 20 hours 24 min ago:
        I doubt list and string concatenation operate in constant time, or else
        they affect another benchmark. E.g., you can concatenate two lists in
        the same time, regardless of their size, but at the cost of slower
        access to the second one (or both).
        
        More contentiously: don't fret too much over performance in Python.
        It's a slow language (except for some external libraries, but that's
        not the point of the OP).
       
          jerf wrote 20 hours 18 min ago:
          String concatenation is mentioned twice on that page, with the same
          time given. The first time it has a parenthetical "(small)", the
          second time doesn't have it. I expect you were looking at the second
          one when you typed that as I would agree that you can't just label it
          as a constant time, but they do seem to have meant concatenating
          "small" strings, where the overhead of Python's object construction
          would dominate the cost of the construction of the combined string.
       
        woodruffw wrote 20 hours 28 min ago:
        Great reference overall, but some of these will diverge in practice:
        141 bytes for a 100 char string won’t hold for non-ASCII strings for
        example, and will change if/when the object header overhead changes.
       
        ktpsns wrote 20 hours 31 min ago:
        Nice numbers and it's always worth to know an order of magnitude. But
        these charts are far away from what "every programmer should know".
       
          jerf wrote 20 hours 13 min ago:
          I think we can safely steelman the claim to "every Python programmer
          should know", and even from there, every "serious" Python programmer,
          writing Python professionally for some "important" reason, not just
          everyone who picks up Python for some scripting task. Obviously
          there's not much reason for a C# programmer to go try to memorize all
          these numbers.
          
          Though IMHO it suffices just to know that "Python is 40-50x slower
          than C and is bad at using multiple CPUs" is not just some sort of
          anti-Python propaganda from haters, but a fairly reasonable
          engineering estimate. If you know that you don't really need that
          chart. If your task can tolerate that sort of performance, you're
          fine; if not, figure out early how you are going to solve that
          problem, be it through the several ways of binding faster code to
          Python, using PyPy, or by not using Python in the first place,
          whatever is appropriate for your use case.
       
       
   DIR <- back to front page