_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
HTML Visit Hacker News on the Web
COMMENT PAGE FOR:
HTML Python numbers every programmer should know
superlopuh wrote 2 hours 38 min ago:
I'm surprised that the `isinstance()` comparison is with `type() ==
type` and not `type() is type`, which I would expect to be faster,
since the `==` implementation tends to have an `isinstance` call
anyway.
superlopuh wrote 2 hours 37 min ago:
Also seems like the repo is now private, so I can't open an issue, or
reproduce the numbers.
cma256 wrote 9 hours 51 min ago:
Great catalogue. On the topic of msgspec, since pydantic is included it
may be worth including a bench for de-serializing and serializing from
a msgspec struct.
iamnotsure wrote 9 hours 56 min ago:
Exactly wrong.
mopsi wrote 10 hours 45 min ago:
It is always a good idea to have at least a rough understanding of how
much operations in your code cost, but sometimes very expensive
mistakes end up in non-obvious places.
If I have only plain Python installed and a .py file that I want to
test, then what's the easiest way to get a visualization of the call
tree (or something similar) and the computational cost of each item?
jiggawatts wrote 11 hours 45 min ago:
My god, the memory bloat is out of this world compared to platforms
like the JVM or .NET, let alone C++ or Rust!
rozab wrote 12 hours 6 min ago:
I wonder why an empty set takes so much more memory than an empty dict
andai wrote 12 hours 59 min ago:
The one I noticed the most was import openai and import numpy.
They're both about a full second on my old laptop.
I ended up writing my own simple LLM library just so I wouldn't have to
import OpenAI anymore for my interactive scripts.
(It's just some wrapper functions around the equivalent of a curl
request, which is honestly basically everything I used the OpenAI
library for anyway.)
kristianp wrote 1 hour 59 min ago:
I have noticed how long it takes to import numpy. It made rerunning a
script noticably sluggish. Not sure what openai's excuse is, but I
assume numpy's slowness is loading some native dlls?
gcanyon wrote 13 hours 3 min ago:
As someone who most often works in a language that is literally orders
of magnitude slower than this â- and has done so since CPU speeds
were measured in double-digit megahertz â- I am crying at the notion
that anything here is measured in nanoseconds
Redoubts wrote 13 hours 36 min ago:
> Attribute read (obj.x) 14 ns
note that protobuf attributes are 20-50x worse than this
charlieyu1 wrote 13 hours 43 min ago:
Surprised that list comprehensions are only 26% faster than for loops.
It used to feel like 4-5x
pvtmert wrote 14 hours 46 min ago:
There are lots of discussions about relatedness of these numbers for a
regular software engineer.
Firstly, I want to start with the fact that the base system is a
macOS/M4Pro, hence;
- Memory related access is possibly much faster than a x86 server.
- Disk access is possibly much slower than a x86 server.
*) I took x86 server as the basis as most of the applications run on
x86 Linux boxes nowadays, although a good amount of fingerprint is also
on other ARM CPUs.
Although it probably does not change the memory footprint much, the
libraries loaded and their architecture (ie. being Rosetta or not) will
change the overall footprint of the process.
As it was mentioned on one of the sibling comments -> Always
inspect/trace your own workflow/performance before making assumptions.
It all depends on specific use-cases for higher-level performance
optimizations.
CmdrKrool wrote 14 hours 51 min ago:
I'm confused by this:
String operations in Python are fast as well. f-strings are the
fastest formatting style, while even the slowest style is still
measured in just nano-seconds.
Concatenation (+) 39.1 ns (25.6M ops/sec)
f-string 64.9 ns (15.4M ops/sec)
It says f-strings are fastest but the numbers show concatenation taking
less time? I thought it might be a typo but the bars on the graph
reflect this too?
Liquid_Fire wrote 1 hour 54 min ago:
Perhaps it's because in all but the simplest cases, you need 2 or
more concatenations to achieve the same result as one single
f-string?
"literal1 " + str(expression) + " literal2"
vs
f"literal1 {expression} literal2"
The only case that would be faster is something like: "foo" +
str(expression)
nodja wrote 15 hours 2 min ago:
I think a lot of commenters here are missing the point.
Looking at performance numbers is important regardless if it's python,
assembly or HDL. If you don't understand why your code is slow you can
always look at how many cycles things take and learn to understand how
code works at a deeper level, as you mature as a programmer things will
become obvious, but going through the learning process and having
references like these will help you to get there sooner, seeing the
performance numbers and asking why some things take much longerâor
sometimes why they take the exact same timeâis the perfect
opportunity to learn.
Early in my python career I had a python script that found duplicate
files across my disks, the first iteration of the script was extremely
slow, optimizing the script went through several iterations as I
learned how to optimize at various levels. None of them required me to
use C. I just used caching, learned to enumerate all files on disk
fast, and used sets instead of lists. The end result was that doing
subsequent runs made my script run in 10 seconds instead of 15 minutes.
Maybe implementing in C would make it run in 1 second, but if I had
just assumed my script was slow because of python then I would've spent
hours doing it in C only to go from 15 minutes to 14 minutes and 51
seconds.
There's an argument to be made that it would be useful to see C numbers
next to the python ones, but for the same reason people don't just tell
you to just use an FPGA instead of using C, it's also rude to say
python is the wrong tool when often it isn't.
robertclaus wrote 15 hours 18 min ago:
I liked reading through it from a "is modern Python doing anything
obviously wrong?" perspective, but strongly disagree anyone should
"know" these numbers. There's like 5-10 primitives in there that
everyone should know rough timings for; the rest should be derived with
big-O algorithm and data structure knowledge.
sireat wrote 15 hours 21 min ago:
Interesting information but these are not hard numbers.
Surely the 100-char string information of 141 bytes is not correct as
it would only apply to ASCII 100-char strings.
It would be more useful to know the overhead for unicode strings
presumably utf-8 encoded. And again I would presume 100-Emoji string
would take 441 bytes (just a hypothesis) and 100-umlaut chars string
would take 241bytes.
JBits wrote 15 hours 38 min ago:
One of the reasons I'm really excited about JAX is that I hope it will
allow me to write fast Python code without worrying about these
details.
calmbonsai wrote 15 hours 44 min ago:
You absolutely do not need to know those absolute numbers--only the
relative costs of various operations.
Additionally, regardless of the code you can profile the system to
determine where the "hot spots" are and refactor or call-out to more
performant (Rust, Go, C) run-times for those workflows where necessary.
sjducb wrote 16 hours 12 min ago:
Itâs missing the time taken to instantiate a class.
I remember refactoring some code to improve readability, then observing
something that was previously a few microseconds take tens of seconds.
The original code created a large list of lists. Each child list had 4
fields each field was a different thing, some were ints and one was a
string.
I created a new class with the names of each field and helper methods
to process the data. The new code created a list of instances of my
class. Downstream consumers of the list could look at the class to see
what data they were getting. Modern Python developers would use a data
class for this.
The new code was very slow. Iâd love it if the author measured the
time taken to instantiate a class.
smcin wrote 14 hours 16 min ago:
Instantiating classes is in general not a performance issue in
Python. Your issue here strongly sounds like you're abusing OO to
pass a list of instances into every method and downstream call (not
just the usual reference to self, the instance at hand). Don't do
that, it shouldn't be necessary. It sounds like you're trying to get
a poor-man's imitation of classmethods, without identifying and
refactoring whatever it is that methods might need to access from
other instances.
Please post your code snippet on StackOverflow ([python] tag) or
CodeReview.SE so people can help you fix it.
> created a new class with the names of each field and helper methods
to process the data. The new code created a list of instances of my
class. Downstream consumers of the list could look at the class to
see what data they were getting.
lifeisstillgood wrote 16 hours 6 min ago:
I went to the doctor and I said âIt hurts when I do thisâ
The doctor said, âdonât do thatâ.
Edit: so yeah a rather snarky reply. Sorry. But itâs worth asking
why we want to use classes and objects everywhere. Alan Kay is well
known for saying object orientated is about message passing (mostly
by Erlang people).
A list of lists (where each list is four different types repeated)
seems a fine data structure, which can be operated on by external
functions, and serialised pretty easily. Turning it into classes and
objects might not be a useful refactoring, I would certainly want to
learn more before giving the go ahead.
sjducb wrote 6 hours 2 min ago:
The main reason why is to keep a handle on complexity.
When youâre in a project with a few million lines of code and 10
years of history it can get confusing.
Your data will have been handled by many different functions before
it gets to you. If you do this with raw lists then the code gets
very confusing. In one data structure customer name might be [4]
and another structure might have it in [9]. Worse someone adds a
new field in [5] then when two lists get concatenated name moves to
[10] in downstream code which consumes the concatenated lists.
krior wrote 13 hours 47 min ago:
I mean it sounds reasonable to me to wrap the data into objects.
customers[3][4]
is a lot less readable than
customers[3].balance
lifeisstillgood wrote 13 hours 13 min ago:
Absolutely
But hidden in this is the failing of every sql-bridge ever -
itâs definitely easier for a programmer to read
customers(3).balance but the trade off now is I have to provide
class based semantics for all operations - and that tends to hide
(oh you know, impedance mismatch).
I would far prefer âstore the records as plain as we canâ and
add on functions to operate over it (think pandas stores
basically just ints floats and strings as it is numpy underneath)
(Yes you can store pyobjects somehow but the performance drops
off a cliff.)
Anyway - keep the storage and data structure as raw and simple as
possible and write functions to run over it. And move to pandas
or SQLite pretty quickly :-)
snakepit wrote 16 hours 33 min ago:
This is helpful. Someone should create a similar benchmark for the
BEAM. This is also a good reminder to continue working on snakepit [1]
and snakebridge [2]. Plenty remains before they're suitable for prime
time. [1]
HTML [1]: https://hex.pm/packages/snakepit
HTML [2]: https://hex.pm/packages/snakebridge
esafak wrote 16 hours 44 min ago:
The point of the original list was that the numbers were simple enough
to memorize: [1] Nobody is going to remember any of the numbers on this
new list.
HTML [1]: https://gist.github.com/jboner/2841832
mikeckennedy wrote 14 hours 39 min ago:
That's a fair point @esafak. I updated the article with something
akin to the doubling chart of numbers in the original article from
2012.
perrygeo wrote 17 hours 0 min ago:
> small int (0-256) cached
It's -5 to 256, and these have very tricky behavior for programmers
that confuse identity and equality.
>>> a = -5
>>> b = -5
>>> a is b
True
>>> a = -6
>>> b = -6
>>> a is b
False
Tostino wrote 12 hours 20 min ago:
Java does similar. Confusing for beginners who run into it for the
first time for sure.
lunixbochs wrote 17 hours 7 min ago:
I'm confused why they repeatedly call a slots class larger than a
regular dict class, but don't count the size of the dict
lcnmrn wrote 17 hours 20 min ago:
LLMs can improve Python code performance. I used it myself on a few
projects.
belabartok39 wrote 17 hours 24 min ago:
Hmmmm, there should absolutely be standard deviations for this type of
work. Also, what is N number of runs? Does it say somewhere?
mikeckennedy wrote 15 hours 36 min ago:
It is open source, you could just look. :) But here is a summary for
you. It's not just one run and take the number:
Benchmark Iteration Process
Core Approach:
- Warmup Phase: 100 iterations to prepare the operation (default)
- Timing Runs: 5 repeated runs (default), each executing the
operation a specified number of times
- Result: Median time per operation across the 5 runs
Iteration Counts by Operation Speed:
- Very fast ops (arithmetic): 100,000 iterations per run
- Fast ops (dict/list access): 10,000 iterations per run
- Medium ops (list membership): 1,000 iterations per run
- Slower ops (database, file I/O): 1,000-5,000 iterations per run
Quality Controls:
- Garbage collection is disabled during timing to prevent
interference
- Warmup runs prevent cold-start bias
- Median of 5 runs reduces noise from outliers
- Results are captured to prevent compiler optimization elimination
Total Executions: For a typical benchmark with 1,000 iterations and 5
repeats, each operation runs 5,100 times (100 warmup + 5Ã1,000
timed) before reporting the median result.
belabartok39 wrote 15 hours 9 min ago:
That answers what N is (why not just say in the article). If you
are only going to report medians, is there an appendix with further
statistics such as confidence intervals or standard deviations. For
serious benchmark, it would be essential to show the spread or
variability, no?
m3047 wrote 17 hours 37 min ago:
+1 but I didn't see pack / unpack...
zbentley wrote 17 hours 48 min ago:
I have some questions and requests for clarification/suspicious
behavior I noticed after reviewing the results and the benchmark code,
specifically:
- If slotted attribute reads and regular attribute reads are the same
latency, I suspect that either the regular class may not have enough
"bells on" (inheritance/metaprogramming/dunder overriding/etc) to
defeat simple optimizations that cache away attribute access, thus
making it equivalent in speed to slotted classes. I know that over time
slotting will become less of a performance boost, but--and this is just
my intuition and I may well be wrong--I don't get the impression that
we're there yet.
- Similarly "read from @property" seems suspiciously fast to me. Even
with descriptor-protocol awareness in the class lookup cache, the
overhead of calling a method seems surprisingly similar to the overhead
of accessing a field. That might be explained away by the fact that
property descriptors' "get" methods are guaranteed to be the simplest
and easiest to optimize of all call forms (bound method, guaranteed to
never be any parameters), and so the overhead of setting up the
stack/frame/args may be substantially minimized...but that would only
be true if the property's method body was "return 1" or something very
fast. The properties tested for these benchmarks, though, are looking
up other fields on the class, so I'd expect them to be a lot slower
than field access, not just a little slower ( [1] ).
- On the topic of "access fields of objects"
(properties/dataclasses/slots/MRO/etc.), benchmarks are really hard to
interpret--not just these benchmarks, all of them I've seen. That's
because there are fundamentally two operations involved: resolving a
field to something that produces data for it, and then accessing the
data. For example, a @property is in a class's method cache, so
resolving "instance.propname" is done at the speed of the methcache.
That might be faster than accessing "instance.attribute" (a field, not
a @property or other descriptor), depending on the inheritance geometry
in play, slots, __getattr[ibute]__ overrides, and so on. On the other
hand, accessing the data at "instance.propname" is going to be a lot
more expensive for most @properties (because they need to call a
function, use an argument stack, and usually perform other attribute
lookups/call other functions/manipulate locals, etc); accessing data at
"instance.attribute" is going to be fast and constant-time--one or two
pointer-chases away at most.
- Nitty: why's pickling under file I/O? Those benchmarks aren't timing
pickle functions that perform IO, they're benchmarking the ser/de
functionality and thus should be grouped with json/pydantic/friends
above.
- Asyncio's no spring chicken, but I think a lot of the benchmarks
listed tell a worse story than necessary, because they don't
distinguish between coroutines, Tasks, and Futures. Coroutines are
cheap to have and call, but Tasks and Futures have a little more
overhead when they're used (even fast CFutures) and a lot more overhead
to construct since they need a lot more data resources than just a
generator function (which is kinda what a raw coroutine desugars to,
but that's not as true as most people think it is...another story for
another time). Now, "run_until_complete{}" and "gather()" initially
take their arguments and coerce them into Tasks/Futures--that
detection, coercion, and construction takes time and consumes a lot of
overhead. That's good to know (since many people are paying that
coercion tax unknowingly), but it muddies the boundary between
"overhead of waiting for an asyncio operation to complete" and
"overhead of starting an asyncio operation". Either calling the
lower-level functions that run_until_complete()/gather() use
internally, or else separating out benchmarks into ones that pass
Futures/Tasks/regular coroutines might be appropriate.
- Benchmarking "asyncio.sleep(0)" as a means of determining the
bare-minimum await time of a Python event loop is a bad idea. sleep(0)
is very special (more details here: [2] ) and not representative. To
benchmark "time it takes for the event loop to spin once and produce a
result"/the python equivalent of process.nextTick, it'd be better to
use low-level loop methods like "call_soon" or defer completion to a
Task and await that.
HTML [1]: https://github.com/mikeckennedy/python-numbers-everyone-should...
HTML [2]: https://news.ycombinator.com/item?id=46056895
thundergolfer wrote 17 hours 48 min ago:
A lot of people here are commenting that if you have to care about
specific latency numbers in Python you should just use another
language.
I disagree. A lot of important and large codebases were grown and
maintained in Python (Instagram, Dropbox, OpenAI) and it's damn useful
to know how to reason your way out of a Python performance problem when
you inevitably hit one without dropping out into another language,
which is going to be far more complex.
Python is a very useful tool, and knowing these numbers just makes you
better at using the tool.
The author is a Python Software Foundation Fellow. They're great at
using the tool.
In the common case, a performance problem in Python is not the result
of hitting the limit of the language but the result of sloppy
un-performant code, for example unnecessarily calling a function
O(10_000) times in a hot loop.
I wrote up a more focused "Python latency numbers you should know" as a
quiz here
HTML [1]: https://thundergolfer.com/computers-are-fast
TacticalCoder wrote 1 hour 25 min ago:
> ... a function O(10_000) times in a hot loop
O(10_000) is a really weird notation.
tialaramex wrote 53 min ago:
Generously we could say they probably mean ~10_000 rather than
O(10_000)
saagarjha wrote 4 hours 8 min ago:
I do performance optimization for a system written in Python. Most of
these numbers are useless to me, because theyâre completely
irrelevant until they become a problem, then I measure them myself.
If you are writing your code trying to save on method calls, youâre
not getting any benefit from using the language and probably should
pick something else.
srean wrote 1 hour 10 min ago:
It's always a balance.
Good designs do not happen in a vacuum but informed with knowledge
of at least the outlines of the environment.
One can have a breakfast pursuing an idea -- let me spill some
sticky milk on the dining table, who cares, I will clean up if it
becomes a problem later.
Another is, it's not much of an overbearing constraint not to make
a mess with spilt milk in the first place, maybe it will not be a
big bother later, but it's not hurting me much now, to be not be
sloppy, so let me be a little hygienic.
There's a balance between making a mess and cleaning up and not
making a mess in the first place. The other extreme is to be so
defensive about the possibility of creating a mess that it
paralyses progress.
The sweet spot is somewhere between the extremes and having the
ball-park numbers in the back of one's mind helps with that. It
informs about the environment.
notepad0x90 wrote 6 hours 49 min ago:
For some of these, there are alternative modules you can use, so it
is important to know this. But if it really matters, I would think
you'd know this already?
For me, it will help with selecting what language is best for a task.
I think it won't change my view that python is an excellent language
to prototype in though.
NoteyComplexity wrote 12 hours 12 min ago:
Agreed, and on top of that:
I think these kind of numbers are everywhere and not just specific to
Python.
In zig, I sometimes take a brief look to the amount of cpu cycles of
various operations to avoid the amount of cache misses. While I need
to aware of the alignment and the size of the data type to debloat a
data structure. If their logic applies, too bad, I should quit
programming since all languages have their own latency on certain
operations we should aware of.
There are reasons to not use Python, but that particular reason is
not the one.
Scubabear68 wrote 13 hours 38 min ago:
No.
Pythonâs issue is that it is incredibly slow in use cases that
surprise average developers. It is incredibly slow at very basic
stuff, like calling a function or accessing a dictionary.
If Python didnât have such an enormous number of popular C and C++
based libraries it would not be here. It was saved by Numpy etc etc.
HenriTEL wrote 47 min ago:
22ns for a function call and dictionary key lookup, that's actually
surprisingly fast.
aragilar wrote 1 hour 48 min ago:
I'm not sure how Python can be described as "saved" by numpy et
al., when the numerical Python ecosystem was there near the
beginning, and the language and ecosystem have co-evolved? Why
didn't Perl (with PDL), R or Ruby (or even php) succeed in the same
way?
dnautics wrote 8 hours 57 min ago:
i hate python but if your bottleneck is that sqlite query,
optimizing a handful of addition operations is a wash. thats why
you need to at least have a feel for these tables
i_am_a_peasant wrote 16 hours 42 min ago:
our build system is written in python, and iâd like it not to suck
but still stay in python, so these numbers very much matter.
oofbey wrote 17 hours 38 min ago:
I think both points are fair. Python is slow - you should avoid it if
speed is critical, but sometimes you canât easily avoid it.
I think the list itself is super long winded and not very
informative. A lot of operations take about the same amount of time.
Does it matter that adding two ints is very slightly slower than
adding two floats? (If you even believe this is true, which I
donât.) No. A better summary would say âall of these things take
about the same amount of time: simple math, function calls, etc.
these things are much slower: IO.â And in that form the summary is
pretty obvious.
microtonal wrote 17 hours 14 min ago:
I think the list itself is super long winded and not very
informative.
I agree. I have to complement the author for the effort put in.
However it misses the point of the original Latency numbers every
programmer should know, which is to build an intuition for making
good ballpark estimations of the latency of operations and that
e.g. A is two orders of magnitude more expensive than B.
nutjob2 wrote 17 hours 38 min ago:
> A lot of important and large codebases were grown and maintained in
Python
How does this happen? Is it just inertia that cause people to write
large systems in a essentially type free, interpreted scripting
language?
IshKebab wrote 13 hours 16 min ago:
Someone says "let's write a prototype in Python" and someone else
says "are you sure we shouldn't use a a better language that is
just as productive but isn't going to lock us into abysmal
performance down the line?" but everyone else says "nah we don't
need to worry about performance yet, and anyway it's just a
prototype - we'll write a proper version when we need to"...
10 years later "ok it's too slow; our options are a) spend $10m
more on servers, b) spend $5m writing a faster Python runtime
before giving up later because nobody uses it, c) spend 2 years
rewriting it and probably failing, during which time we can make no
new features. a) it is then."
anhner wrote 3 hours 17 min ago:
If I made an app in python and in 10 years it grows so successful
that it needs a $10m vertical scale or $5m rewrite, I wouldn't
even complain.
rented_mule wrote 9 hours 48 min ago:
What many startups need to succeed is to be able to
pivot/develop/repeat very quickly to find a product+market that
makes money. If they don't find that, and most don't, the
millions you talk about never come due. They also rarely have
enough developers, so developer productivity in the short term is
vital to that iteration speed. If that startup turns into Dropbox
or Instagram, the millions you mention are round-off error on
many billions. Easy business decision, and startups are first and
foremost businesses.
Some startups end up in between the two extremes above. I was at
one of the Python-based ones that ended up in the middle. At $30M
in annual revenue, Python was handling 100M unique monthly
visitors on 15 cheap, circa-2010 servers. By the time we hit $1B
in annual revenue, we had Spark for both heavy batch computation
and streaming computation tasks, and Java for heavy online
computational workloads (e.g., online ML inference). There were
little bits of Scala, Clojure, Haskell, C++, and Rust here and
there (with well over 1K developers, things creep in over the
years). 90% of the company's code was still in Python and it
worked well. Of course there were pain points, but there always
are. At $1B in annual revenue, there was budget for investments
to make things better (cleaning up architectural choices that
hadn't kept up, adding static types to core things, scaling up
tooling around package management and CI, etc.).
But a key to all this... the product that got to $30M (and
eventually $1B+) looked nothing like what was pitched to initial
investors. It was unlikely that enough things could have been
tried to land on the thing that worked without excellent
developer productivity early on. Engineering decisions are not
only about technical concerns, they are also about the business
itself.
fud101 wrote 10 hours 7 min ago:
I don't know a better open source language than Python. Java and
C# are both better (platforms) but they come with that obvious
corporate catch.
gcanyon wrote 13 hours 10 min ago:
What language is âjust as productive but isn't going to lock us
into abysmal performance down the lineâ?
What makes that language not strictly superior to Python?
nazgul17 wrote 10 hours 48 min ago:
Loose typing makes you really fast at writing code, as long as
you can keep all the details in your head. Python is great for
smaller stuff. But crossed some threshold, the lack of a
mechanism that has your back starts slowing you down.
gcanyon wrote 8 hours 59 min ago:
Sure, my language of choice is more flexible than that: I can
type
put "test abc999 this" into x
add 1 to char 4 to 6 of word 2 of x
put x -- puts "test abc1000 this"
But I'm still curious -- what's the better language?
wiseowise wrote 14 hours 22 min ago:
Python has types, now even gradual static typing if you want to go
further. It's irrelevant whether language is interpreted scripting
if it solves your problem.
tjwebbnorfolk wrote 14 hours 30 min ago:
Most large things begin life as small things.
hibikir wrote 16 hours 54 min ago:
Small startups end up writing code in whatever gets things working
faster, because having too large a codebase with too much load is
a champagne problem.
If I told you that we were going to be running a very large
payments system, with customers from startups to Amazon, you'd not
write it in ruby and put the data in MongoDB, and then using its
oplog as a queue... but that's what Stripe looked like. They even
hired a compiler team to add type checking to the language, as that
made far more sense than porting a giant monorepo to something
else.
oivey wrote 17 hours 29 min ago:
Itâs a nice and productive language. Why is that
incomprehensible?
xboxnolifes wrote 17 hours 32 min ago:
It's very simple. Large systems start as small systems.
dragonwriter wrote 15 hours 35 min ago:
Large systems are often aggregates of small systems, too.
oofbey wrote 17 hours 36 min ago:
Itâs very natural. Python is fantastic for going from 0 to 1
because itâs easy and forgiving. So lots of projects start with
it. Especially anything ML focused. And itâs much harder to
change tools once a project is underway.
passivegains wrote 17 hours 2 min ago:
this is absolutely true, but there's an additional nuance: yes,
python is fantastic, yes, it's easy and forgiving, but there are
other languages like that too.
...except there really aren't. other than ruby and maybe go,
every other popular language sacrifices ease of use for things
that simply do not matter for the overwhelming majority of
programs. much of python's popularity doesn't come from being
easy and forgiving, it's that everything else isn't. for normal
programming why would we subject ourselves to anything but python
unless we had no choice?
while I'm on the soapbox I'll give java a special mention: a
couple years ago I'd have said java was easy even though it's
tedious and annoying, but I've become reacquainted with it for a
high school program (python wouldn't work for what they're doing
and the school's comp sci class already uses java.)
this year we're switching to c++.
zelphirkalt wrote 14 hours 11 min ago:
Omg, switching to C++ for pupils programming beginners ... "How
to turn off the most students from computer programming?" 101.
Really can't get much worse than C++ for beginners.
nightfly wrote 10 hours 28 min ago:
PSU (Oregon) uses C++ as just "c with classes" and ignores
the rest of C++ for intro to programming courses. It
frustrates people who already use C++ but otherwise works
pretty well.
jgalt212 wrote 23 min ago:
C++, The Good Parts
Izkata wrote 43 min ago:
This was how we learned it in an intro class in highschool
ages ago, worked pretty well there too.
f311a wrote 17 hours 58 min ago:
> Strings
>The rule of thumb for strings is the core string object takes 41
bytes. Each additional character is 1 byte.
That's misleading. There are three types of strings in Python (1, 2 and
4 bytes per character).
HTML [1]: https://rushter.com/blog/python-strings-and-memory/
Retr0id wrote 18 hours 39 min ago:
> Numbers are surprisingly large in Python
Makes me wonder if the cpython devs have ever considered v8-like
NaN-boxing or pointer stuffing.
ewuhic wrote 18 hours 51 min ago:
This is AI slop.
Lockal wrote 1 hour 55 min ago:
Sad that your comment is downvoted. But yes, for those who need
clarification:
1) Measurements are faulty. List of 1,000 ints can be 4x smaller.
Most time measurements depend on circumstances that are not
mentioned, therefore can't be reproduced.
2) Brainrot AI style. Hashmap is not "200x faster than list!", that's
not how complexity works.
3) orjson/ujson are faulty, which is one of the reasons they don't
replace stdlib implementation. Expect crashes, broken jsons, anything
from them
4) What actually will be used in number-crunching applications -
numpy or similar libraries - is not even mentioned.
mikeckennedy wrote 19 hours 3 min ago:
Author here.
Thanks for the feedback everyone. I appreciate your posting it
@woodenchair and @aurornis for pointing out the intent of the article.
The idea of the article is NOT to suggest you should shave 0.5ns off by
choosing some dramatically different algorithm or that you really need
to optimize the heck out of everything.
In fact, I think a lot of what the numbers show is that over thinking
the optimizations often isn't worth it (e.g. caching len(coll) into a
variable rather than calling it over and over is less useful that it
might seem conceptually).
Just write clean Python code. So much of it is way faster than you
might have thought.
My goal was only to create a reference to what various operations cost
to have a mental model.
willseth wrote 17 hours 54 min ago:
Then you should have written that. Instead you have given more fodder
for the premature optimization crowd.
mikeckennedy wrote 14 hours 38 min ago:
I didn't tell anyone to optimize anything. I just posted numbers.
It's not my fault some people are wired that way. Anytime I
suggested some sort of recommendation it was to NOT optimize.
For example, from the post "Maybe we donât have to optimize it
out of the test condition on a while loop looping 100 times after
all."
calmbonsai wrote 9 hours 6 min ago:
The literal title is "Python Numbers Every Programmer Should
Know" which implies the level of detail in the article (down to
the values of the numbers) is important. It is not.
It is helpful to know the relative value (costs) of these
operations. Everything else can be profiled and optimized for
the particular needs of a workflow in a specific architecture.
To use an analogy, turbine designers no longer need to know the
values in the "steam tables", but they do need to know efficient
geometries and trade-offs among them when designing any Rankine
cycle to meet power, torque, and Reynolds regimes.
boerseth wrote 19 hours 12 min ago:
That's a long list of numbers that seem oddly specific. Apart from
learning that f-strings are way faster than the alternatives, and
certain other comparisons, I'm not sure what I would use this for
day-to-day.
After skimming over all of them, it seems like most "simple" operations
take on the order of 20ns. I will leave with that rule of thumb in
mind.
aunderscored wrote 15 hours 36 min ago:
If you're interested, fstrings are faster because they directly
become bytecode at compile time rather than being a function call at
runtime
apelapan wrote 14 hours 12 min ago:
Thanks for the that bit of info! I was surprised by the speed
difference. I have always assumed that most variations of basic
string formatting would compile to the same bytecode.
I usually prefer classic %-formatting for readability when the
arguments are longer and f-strings when the arguments are shorter.
Knowing there is a material performance difference at scale, might
shift the balance in favour of f-strings for some situations.
0x000xca0xfe wrote 18 hours 50 min ago:
That number isn't very useful either, it really depends on the
hardware. Most virtualized server CPUs where e.g. Django will run on
in the end are nowhere near the author's M4 Pro.
Last time I benchmarked a VPS it was about the performance of an Ivy
Bridge generation laptop.
giantrobot wrote 18 hours 2 min ago:
> Last time I benchmarked a VPS it was about the performance of an
Ivy Bridge generation laptop.
I have a number of Intel N95 systems around the house for various
things. I've found them to be a pretty accurate analog for small
instances VPSes. The N95 are Intel E-cores which are effectively
Sandy Bridge/Ivy Bridge cores.
Stuff can fly on my MacBook but than drag on a small VPS instance
but validating against an N95 (I already have) is helpful. YMMV.
mwkaufma wrote 19 hours 18 min ago:
Why? If those micro benchmarks mattered in your domain, you wouldn't be
using python.
PhilipRoman wrote 17 hours 18 min ago:
...and other hilarious jokes you can tell yourself!
coldtea wrote 19 hours 7 min ago:
That's an "all or nothing" fallacy. Just because you use Python and
are OK with some slowdown, doesn't mean you're OK with each and every
slowdown when you can do better.
To use a trivial example, using a set instead of a list to check
membership is a very basic replacement, and can dramatically improve
your running time in Python. Just because you use Python doesn't mean
anything goes regarding performance.
mwkaufma wrote 18 hours 47 min ago:
That's an example of an algorithmic improvement (log n vs n), not a
micro benchmark, Mr. Fallacy.
coldtea wrote 14 hours 41 min ago:
"Mr. Fallacy."? Got any better juvenile name-calling?
The case is among the example numbers given in TFA:
"Dict lookup by key", "List membership check"
Does it have to spell out the difference is algorithmic in this
case for the comparison to be useful?
Or, inversely, is the difference between e.g. memory and disk
access times insignificant, because it's not algorithmic?
Aurornis wrote 19 hours 22 min ago:
A meta-note on the title since it looks like itâs confusing a lot of
commenters: The title is a play on Jeff Deanâs famous âLatency
Numbers Every Programmer Should Knowâ from 2012. It isnât meant to
be interpreted literally. Thereâs a common theme in CS papers and
writing to write titles that play upon themes from past papers. Another
common example is the â_____ considered harmfulâ titles.
dekhn wrote 16 hours 16 min ago:
That doc predates 2012 significantly.
From what I've been able to glean, it was basically created in the
first few years Jeff worked at Google, on indexing and serving for
the original search engine. For example, the comparison of cache,
RAM, and disk: determined whether data was stored in RAM (the index,
used for retrieval) or disk (the documents, typically not used in
retrieval, but used in scoring). Similarly, the comparison of
California-Netherlands time- I believe Google's first international
data cetner was in NL and they needed to make decisions about copying
over the entire index in bulk versus serving backend queries in the
US with frontends in the NL.
The numbers were always going out of date; for example, the arrival
of flash drives changed disk latency significantly. I remember Jeff
came to me one day and said he'd invented a compression algorithm for
genomic data "so it can be served from flash" (he thought it would be
wasteful to use precious flash space on uncompressed genomic data).
willseth wrote 17 hours 57 min ago:
Good callout on the paper reference, but this author gives gives
every indication that heâs dead serious in the first paragraph. I
donât think commenters are confused.
shanemhansen wrote 18 hours 24 min ago:
Going to write a real banger of a paper called "latency numbers
considered harmful is all you need" and watch my academic cred go
through the roof.
AnonymousPlanet wrote 14 hours 57 min ago:
" ... with an Application to the Entscheidungsproblem"
Kwpolska wrote 18 hours 49 min ago:
This title only works if the numbers are actually useful. Those are
not, and there are far too many numbers for this to make sense.
Aurornis wrote 18 hours 35 min ago:
The title was meant to be taken literally, as in you're supposed to
memorize all of these numbers. It was meant as an in-joke reference
to the original writing to signal that this document was going to
contain timing values for different operations.
I completely understand why it's frustrating or confusing by
itself, though.
ZiiS wrote 19 hours 43 min ago:
This is really weird thing to worry about in python. But is also
misleading; Python int is arbitrary precision, they can take up much
more storage and arithmetic time depending in their value.
willseth wrote 19 hours 49 min ago:
Every Python programmer should be thinking about far more important
things than low level performance minutiae. Great reference but
practically irrelevant except in rare cases where optimization is
warranted. If your workload grows to the point where this stuff
actually matters, great! Until then itâs a distraction.
HendrikHensen wrote 18 hours 28 min ago:
Having general knowledge about the tools you're working with is not a
distraction, it's an intellectual enrichment in any case, and can be
a valuable asset in specific cases.
willseth wrote 18 hours 8 min ago:
Knowing that an empty string is 41 bytes or how many ns it takes to
do arithmetic operations is not general knowledge.
oivey wrote 17 hours 19 min ago:
How is it not general knowledge? How do you otherwise gauge if
your program is taking a reasonable amount of time, and, if not,
how do you figure out how to fix it?
dirtbag__dad wrote 11 hours 11 min ago:
In my experience, which is series A or earlier data intensive
SaaS, you can gauge whether a program is taking a reasonable
amount of time just by running it and using your common sense.
P50 latency for a fastapi serviceâs endpoint is 30+ seconds.
Your ingestion pipeline, which has a data ops person on your
team waiting for it to complete, takes more than one business
day to run.
Your program is obviously unacceptable. And, your problems are
most likely completely unrelated to these heuristics. You
either have an inefficient algorithm or more likely you are
using the wrong tool (ex OLTP for OLAP) or the right tool the
wrong way (bad relational modeling or an outdated LLM model).
If you are interested in shaving off milliseconds in this
context then you are wasting your time on the wrong thing.
All that being said, Iâm sure that thereâs a very good
reason to know this stuff in the context of some other domains,
organizations, company size/moment. I suspect these metrics are
irrelevant to disproportionately more people reading this.
At any rate, for those of us who like to learn, I still found
this valuable but by no means common knowledge
oivey wrote 10 hours 33 min ago:
I'm not sure it's common knowledge, but it is general
knowledge. Not all HNers are writing web apps. Many may be
writing truly compute bound applications.
In my experience writing computer vision software, people
really struggle with the common sense of how fast computers
really are. Some knowledge like how many nanoseconds an add
takes can be very illuminating to understand whether their
algorithm's runtime makes any sense. That may push loose the
bit of common sense that their algorithm is somehow wrong.
Often I see people fail to put bounds on their expectations.
Numbers like these help set those bounds.
dirtbag__dad wrote 44 min ago:
Thanks this is helpful framing!
cycomanic wrote 16 hours 13 min ago:
But these performance numbers are meaningless without some sort
of standard comparison case. So if you measure that e.g. some
string operation takes 100ns, how do you compare against the
numbers given here? Any difference could be due to PC, python
version or your implementation. So you have to do proper
benchmarking anyway.
ehaliewicz2 wrote 9 hours 53 min ago:
If your program does 1 million adds, but it takes
significantly longer than 19 milliseconds, you can guess that
something else is going on.
willseth wrote 16 hours 27 min ago:
You gauge with metrics and profiles, if necessary, and address
as needed. You donât scrutinize every line of code over
whether itâs âreasonableâ in advance instead of doing
things that actually move the needle.
oivey wrote 16 hours 18 min ago:
These are the metrics underneath it all. Profiles tell you
what parts are slow relative to others and time your specific
implementation. How long should it take to sum together a
million integers?
willseth wrote 15 hours 50 min ago:
It literally doesnât matter unless it impacts users. I
donât know why you would waste time on non problems.
oivey wrote 14 hours 20 min ago:
No one is suggesting âwasting time on non problems.â
Youâre tilting at windmills.
willseth wrote 12 hours 49 min ago:
Read more carefully
kc0bfv wrote 19 hours 22 min ago:
I agree - however, that has mostly been a feeling for me for years.
Things feel fast enough and fine.
This page is a nice reminder of the fact, with numbers. For a while,
at least, I will Know, instead of just feel, like I can ignore the
low level performance minutiae.
amelius wrote 19 hours 45 min ago:
Yeah, if you hit limits just look for a module that implements the
thing in C (or write it). This is how it was always done in Python.
ryandrake wrote 17 hours 8 min ago:
I am currently (as we type actually LOL) doing this exact thing in
a hobby GIS project: Python got me a prototype and proof of
concept, but now that I am scaling the data processing to
worldwide, it is obviously too slow so I'm rewriting it (with LLM
assistance) in C. The huge benefit of Python is that I have a known
working (but slow) "reference implementation" to test against. So I
know the C version works when it produces identical output. If I
had a known-good Python version of past C, C++, Rust, etc. projects
I worked on, it would have been most beneficial when it came time
to test and verify.
willseth wrote 19 hours 24 min ago:
Sometimes itâs as simple as finding the hotspot with a profiler
and making a simple change to an algorithm or data structure, just
like you would do in any language. The amount of handwringing
people do about building systems with Python is silly.
867-5309 wrote 19 hours 50 min ago:
tfa mentions running benchmark on a multi-core platform, but doesn't
mention if benchmark results used multithreading.. a brief look at the
code suggests not
jchmbrln wrote 19 hours 51 min ago:
What would be the explanation for an int taking 28 bytes but a list of
1000 ints taking only 7.87KB?
wiml wrote 18 hours 43 min ago:
That appears to be the size of the list itself, not including the
objects it contains: 8 bytes per entry for the object pointer, and a
kilo-to-kibi conversion. All Python values are "boxed", which is
probably a more important thing for a Python programmer to know than
most of these numbers.
The list of floats is larger, despite also being simply an array of
1000 8-byte pointers. I assume that it's because the int array is
constructed from a range(), which has a __len__(), and therefore the
list is allocated to exactly the required size; but the float array
is constructed from a generator expression and is presumably
dynamically grown as the generator runs and has a bit of free space
at the end.
lopuhin wrote 17 hours 22 min ago:
That's impressive how you figured out the reason for the difference
in list of floats vs list of ints container size, framed as an
interview question that would have been quite difficult I think
mikeckennedy wrote 17 hours 40 min ago:
It was. I updated the results to include the contained elements. I
also updated the float list creation to match the int list
creation.
Y_Y wrote 20 hours 7 min ago:
int is larger than float, but list of floats is larger than list of
ints
Then again, if you're worried about any of the numbers in this article
maybe you shouldn't be using Python at all. I joke, but please do at
least use Numba or Numpy so you aren't paying huge overheads for making
an object of every little datum.
_ZeD_ wrote 20 hours 8 min ago:
Yeah... No.
I've 10+ years of python under my belt and I might have had need for
this kind of micro optimizations in like 2 times most
willseth wrote 17 hours 40 min ago:
Sorry, youâre not allowed to discourage premature optimization or
defend Python here.
riazrizvi wrote 20 hours 14 min ago:
The titles are oddly worded. For example -
Collection Access and Iteration
How fast can you get data out of Pythonâs built-in collections?
Here is a dramatic example of how much faster the correct data
structure is. item in set or item in dict is 200x faster than item in
list for just 1,000 items!
It seems to suggest an iteration for x in mylist is 200x slower than
for x in myset. Itâs the membership test that is much slower. Not the
iteration. (Also for x in mydict is an iteration over keys not values,
and so isnât what we think of as an iteration on a dictâs
âdataâ).
Also the overall title âPython Numbers Every Programmer Should
Knowâ starts with 20 numbers that are merely interesting.
That all said, the formatting is nice and engaging.
dr_kretyn wrote 20 hours 16 min ago:
Initially I thought how efficient strings are... but then I understood
how inefficient arithmetic is.
Interesting comparison but exact speed and IO depend on a lot of
things, and unlikely one uses Mac mini in production so these numbers
definitely aren't representative.
oogali wrote 20 hours 17 min ago:
It's important to know that these numbers will vary based on what
you're measuring, your hardware architecture, and how your particular
Python binary was built.
For example, my M4 Max running Python 3.14.2 from Homebrew (built, not
poured) takes 19.73MB of RAM to launch the REPL (running `python3` at a
prompt).
The same Python version launched on the same system with a single
invocation for `time.sleep()`[1] takes 11.70MB.
My Intel Mac running Python 3.14.2 from Homebrew (poured) takes 37.22MB
of RAM to launch the REPL and 9.48MB for `time.sleep`.
My number for "how much memory it's using" comes from running `ps auxw
| grep python`, taking the value of the resident set size (RSS column),
and dividing by 1,024.
1: python3 -c 'from time import sleep; sleep(100)'
xnx wrote 20 hours 20 min ago:
Python programmers don't need to know 85 different obscure performance
numbers. Better to really understand ~7 general system performance
numbers.
zelphirkalt wrote 20 hours 21 min ago:
I doubt there is much to gain from knowing how much memory an empty
string takes. The article or the listed numbers have a weird fixation
on memory usage numbers and concrete time measurements. What is way
more important to "every programmer" is time and space complexity, in
order to avoid designing unnecessarily slow or memory hungry programs.
Under the assumption of using Python, what is the use of knowing that
your int takes 28 bytes? In the end you will have to determine, whether
the program you wrote meats the performance criteria you have and if it
does not, then you need a smarter algorithm or way of dealing with
data. It helps very little to know that your 2d-array of 1000x1000
bools is so and so big. What helps is knowing, whether it is too much
and maybe you should switch to using a large integer and a bitboard
approach. Or switch language.
kingstnap wrote 16 hours 46 min ago:
I disagree. Performance is a leaky abstraction that *ALWAYS* matters.
Your cognition of it is either implicit or explicit.
Even if you didn't know for example that list appends was linear and
not quadratic and fairly fast.
Even if you didn't give a shit if simple programs were for some
reason 10000x slower than they needed to be because it meets some
baseline level of good enough / and or you aren't the one impacted by
the problems inefficacy creates.
Library authors beneath you would still know and the APIs you
interact with and the pythonic code you see and the code LLMS
generate will be affected by that leaky abstraction.
If you think that n^2 naive list appends is a bad example its not
btw, python string appends are n^2 and that has and does affect how
people do things, f strings for example are lazy.
Similarly a direct consequence of dictionaries being fast in Python
is that they are used literally everywhere. The old Pycon 2017 talks
from Raymond talk about this.
Ultimately what the author of the blog has provided is this sort of
numerical justification for the implicit tacit sort of knowledge
performance understanding gives.
Qem wrote 20 hours 8 min ago:
> Under the assumption of using Python, what is the use of knowing
that your int takes 28 bytes?
Relevant if your problem demands instatiation of a large number of
objects. This reminds me of a post where Eric Raymond discusses the
problems he faced while trying to use Reposurgeon to migrate GCC. See
HTML [1]: http://esr.ibiblio.org/?p=8161
fooker wrote 20 hours 21 min ago:
Counterintuitively: program in python only if you can get away without
knowing these numbers.
When this starts to matter, python stops being the right tool for the
job.
bathtub365 wrote 16 hours 44 min ago:
These basically seem like numbers of last resort. After youâve
profiled and ruled out all of the usual culprits (big disk reads,
network latency, polynomial or exponential time algorithms, wasteful
overbuilt data structures, etc) and need to optimize at the level of
individual operations.
Quothling wrote 18 hours 20 min ago:
Why? I've build some massive analytic data flows in Python with
turbodbc + pandas which are basically C++ fast. It uses more memory
which supports your point, but on the flip-side we're talking $5-10
extra cost a year. It could frankly be $20k a year and still be
cheaper than staffing more people like me to maintain these things,
rather than having a couple of us and then letting the BI people use
the tools we provide for them. Similarily when we do embeded work,
micro-python is just so much easier to deal with for our engineering
staff.
The interoperability between C and Python makes it great, and you
need to know these numbers on Python to know when to actually build
something in C. With Zig getting really great interoperability,
things are looking better than ever.
Not that you're wrong as such. I wouldn't use Python to run an
airplane, but I really don't see why you wouldn't care about the
resources just because you're working with an interpreted or GC
language.
its-summertime wrote 17 hours 2 min ago:
From the complete opposite side, I've built some tiny bits of near
irrelevant code where python has been unacceptable, e.g. in shell
startup / in bash's PROMPT_COMMAND, etc. It ends up having a very
painfully obvious startup time, even if the code is nearing the
equivalent of Hello World
time python -I -c 'print("Hello World")'
real 0m0.014s
time bash --noprofile -c 'echo "Hello World"'
real 0m0.001s
dekhn wrote 16 hours 21 min ago:
What exactly do you need 1ms instead of 14ms startup time in a
shell startup?
The difference is barely perceptible.
Most of the time starting up is time spent seartching the
filesystem for thousands of packages.
NekkoDroid wrote 15 hours 50 min ago:
> What exactly do you need 1ms instead of 14ms startup time in
a shell startup?
I think as they said: when dynamically building a shell input
prompt it starts to become very noticable if you have like 3 or
more of these and you use the terminal a lot.
dekhn wrote 13 hours 21 min ago:
Ah, I only noticed the "shell startup" bit.
Yes, after 2-3 I agree you'd start to notice if you were
really fast. I suppose at that point I'd just have Gemini
rewrite the prompt-building commands in Rust (it's quite good
at that) or merge all the prompt-building commands into a
single one (to amortize the startup cost).
its-summertime wrote 2 hours 1 min ago:
[1] perhaps? I should probably start using it again
honestly.
HTML [1]: https://starship.rs/
fooker wrote 18 hours 4 min ago:
> you need to know these numbers on Python to know when to actually
build something in C
People usually approach this the other way, use something like
pandas or numpy from the beginning if it solves your problem. Do
not write matrix multiplications or joins in python at all.
If there is no library that solves your problem, it's a great
indication that you should avoid python. Unless you are willing to
spend 5 man-years writing a C or C++ library with good python
interop.
oivey wrote 17 hours 22 min ago:
People generally arenât rolling their own matmuls or joins or
whatever in production code. There are tons of tools like Numba,
Jax, Triton, etc that you can use to write very fast code for
new, novel, and unsolved problems. The idea that âif you need
fast code, donât write Pythonâ has been totally obsolete for
over a decade.
fooker wrote 17 hours 3 min ago:
Yes, that's what I said.
If you are writing performance sensitive code that is not
covered by a popular Python library, don't do it unless you are
a megacorp that can put a team to write and maintain a library.
oivey wrote 16 hours 57 min ago:
It isnât what you said. If you want, you can write your own
matmul in Numba and it will be roughly as fast as similar C
code. You shouldnât, of course, for the same reason
handrolling your own matmuls in C is stupid.
Many problems can performantly solved in pure Python,
especially via the growing set of tools like the JIT
libraries I cited. Even more will be solvable when things
like free threaded Python land. It will be a minority of
problems that canât be, if it isnât already.
Demiurge wrote 18 hours 26 min ago:
I agree. I've been living off Python for 20 years and have never
needed to know any of these numbers, nor do I need them now, for my
work, contrary to the title. I also regularly use profiling for
performance optimization and opt for Cython, SWIG, JIT libraries, or
other tools as needed. None of these numbers would ever factor into
my decision-making.
AtlasBarfed wrote 16 hours 45 min ago:
.....
You don't see any value in knowing that numbers?
Demiurge wrote 11 hours 32 min ago:
That's what I just said. There is zero value to me knowing these
numbers. I assume that all python built in methods are pretty
much the same speed. I concentrate on IO being slow, minimizing
these operations. I think about CPU intensive loops that process
large data, and I try to use libraries like numpy, DuckDB, or
other tools to do the processing. If I have a more complicated
system, I profile its methods, and optimize tight loops based on
PROFILING. I don't care what the numbers in the article are,
because I PROFILE, and I optimize the procedures that are the
slowest, for example, using cython. Which part of what I am
saying does not make sense?
KeplerBoy wrote 1 hour 21 min ago:
That makes perfect sense. Especially since those numbers can
change with new python versions.
TuringTest wrote 15 hours 57 min ago:
As others have pointed out, Python is better used in places where
those numbers aren't relevant.
If they start becoming relevant, it's usually a sign that you're
using the language in a domain where a duck-typed bytecode
scripting-glue language is not well-suited.
libraryofbabel wrote 20 hours 6 min ago:
Or keep your Python scaffolding, but push the performance-critical
bits down into a C or Rust extension, like numpy, pandas, PyTorch and
the rest all do.
But I agree with the spirit of what you wrote - these numbers are
interesting but arenât worth memorizing. Instead, instrument your
code in production to see where itâs slow in the real world with
real user data (premature optimization is the root of all evil etc),
profile your code (with pyspy, itâs the best tool for this if
youâre looking for cpu-hogging code), and if you find yourself
worrying about how long it takes to add something to a list in Python
you really shouldnât be doing that operation in Python at all.
eichin wrote 19 hours 20 min ago:
"if you're not measuring, you're not optimizing"
MontyCarloHall wrote 20 hours 17 min ago:
Exactly. If you're working on an application where these numbers
matter, Python is far too high-level a language to actually be able
to optimize them.
tgv wrote 20 hours 24 min ago:
I doubt list and string concatenation operate in constant time, or else
they affect another benchmark. E.g., you can concatenate two lists in
the same time, regardless of their size, but at the cost of slower
access to the second one (or both).
More contentiously: don't fret too much over performance in Python.
It's a slow language (except for some external libraries, but that's
not the point of the OP).
jerf wrote 20 hours 18 min ago:
String concatenation is mentioned twice on that page, with the same
time given. The first time it has a parenthetical "(small)", the
second time doesn't have it. I expect you were looking at the second
one when you typed that as I would agree that you can't just label it
as a constant time, but they do seem to have meant concatenating
"small" strings, where the overhead of Python's object construction
would dominate the cost of the construction of the combined string.
woodruffw wrote 20 hours 28 min ago:
Great reference overall, but some of these will diverge in practice:
141 bytes for a 100 char string wonât hold for non-ASCII strings for
example, and will change if/when the object header overhead changes.
ktpsns wrote 20 hours 31 min ago:
Nice numbers and it's always worth to know an order of magnitude. But
these charts are far away from what "every programmer should know".
jerf wrote 20 hours 13 min ago:
I think we can safely steelman the claim to "every Python programmer
should know", and even from there, every "serious" Python programmer,
writing Python professionally for some "important" reason, not just
everyone who picks up Python for some scripting task. Obviously
there's not much reason for a C# programmer to go try to memorize all
these numbers.
Though IMHO it suffices just to know that "Python is 40-50x slower
than C and is bad at using multiple CPUs" is not just some sort of
anti-Python propaganda from haters, but a fairly reasonable
engineering estimate. If you know that you don't really need that
chart. If your task can tolerate that sort of performance, you're
fine; if not, figure out early how you are going to solve that
problem, be it through the several ways of binding faster code to
Python, using PyPy, or by not using Python in the first place,
whatever is appropriate for your use case.
DIR <- back to front page