Or keep your Python scaffolding, but push the performance-critical bits down into a C or Rust extension, like numpy, pandas, PyTorch and the rest all do.
But I agree with the spirit of what you wrote - these numbers are interesting but aren’t worth memorizing. Instead, instrument your code in production to see where it’s slow in the real world with real user data (premature optimization is the root of all evil etc), profile your code (with pyspy, it’s the best tool for this if you’re looking for cpu-hogging code), and if you find yourself worrying about how long it takes to add something to a list in Python you really shouldn’t be doing that operation in Python at all.
I doubt there is much to gain from knowing how much memory an empty string takes. The article or the listed numbers have a weird fixation on memory usage numbers and concrete time measurements. What is way more important to "every programmer" is time and space complexity, in order to avoid designing unnecessarily slow or memory hungry programs. Under the assumption of using Python, what is the use of knowing that your int takes 28 bytes? In the end you will have to determine, whether the program you wrote meats the performance criteria you have and if it does not, then you need a smarter algorithm or way of dealing with data. It helps very little to know that your 2d-array of 1000x1000 bools is so and so big. What helps is knowing, whether it is too much and maybe you should switch to using a large integer and a bitboard approach. Or switch language.
> Under the assumption of using Python, what is the use of knowing that your int takes 28 bytes?
Relevant if your problem demands instatiation of a large number of objects. This reminds me of a post where Eric Raymond discusses the problems he faced while trying to use Reposurgeon to migrate GCC. See http://esr.ibiblio.org/?p=8161
It's important to know that these numbers will vary based on what you're measuring, your hardware architecture, and how your particular Python binary was built.
For example, my M4 Max running Python 3.14.2 from Homebrew (built, not poured) takes 19.73MB of RAM to launch the REPL (running `python3` at a prompt).
The same Python version launched on the same system with a single invocation for `time.sleep()`[1] takes 11.70MB.
My Intel Mac running Python 3.14.2 from Homebrew (poured) takes 37.22MB of RAM to launch the REPL and 9.48MB for `time.sleep`.
My number for "how much memory it's using" comes from running `ps auxw | grep python`, taking the value of the resident set size (RSS column), and dividing by 1,024.
1: python3 -c 'from time import sleep; sleep(100)'
Collection Access and Iteration
How fast can you get data out of Python’s built-in collections? Here is a dramatic example of how much faster the correct data structure is. item in set or item in dict is 200x faster than item in list for just 1,000 items!
It seems to suggests an iteration for x in mylist is 200x slower than for x in myset. It’s the membership test that is much slower. Not the iteration.
Also the overall title “Python Numbers Every Programmer Should Know” starts with 20 numbers that are merely interesting.
int is larger than float, but list of floats is larger than list of ints
Then again, if you're worried about any of the numbers in this article maybe you shouldn't be using Python at all. I joke, but please do at least use Numba or Numpy so you aren't paying huge overheads for making an object of every little datum.
Initially I thought how efficient strings are... but then I understood how inefficient arithmetic is.
Interesting comparison but exact speed and IO depend on a lot of things, and unlikely one uses Mac mini in production so these numbers definitely aren't representative.
I doubt list and string concatenation operate in constant time, or else they affect another benchmark. E.g., you can concatenate two lists in the same time, regardless of their size, but at the cost of slower access to the second one (or both).
More contentiously: don't fret too much over performance in Python. It's a slow language (except for some external libraries, but that's not the point of the OP).
String concatenation is mentioned twice on that page, with the same time given. The first time it has a parenthetical "(small)", the second time doesn't have it. I expect you were looking at the second one when you typed that as I would agree that you can't just label it as a constant time, but they do seem to have meant concatenating "small" strings, where the overhead of Python's object construction would dominate the cost of the construction of the combined string.
I think we can safely steelman the claim to "every Python programmer should know", and even from there, every "serious" Python programmer, writing Python professionally for some "important" reason, not just everyone who picks up Python for some scripting task. Obviously there's not much reason for a C# programmer to go try to memorize all these numbers.
Though IMHO it suffices just to know that "Python is 40-50x slower than C and is bad at using multiple CPUs" is not just some sort of anti-Python propaganda from haters, but a fairly reasonable engineering estimate. If you know that you don't really need that chart. If your task can tolerate that sort of performance, you're fine; if not, figure out early how you are going to solve that problem, be it through the several ways of binding faster code to Python, using PyPy, or by not using Python in the first place, whatever is appropriate for your use case.
Great reference overall, but some of these will diverge in practice: 141 bytes for a 100 char string won’t hold for non-ASCII strings for example, and will change if/when the object header overhead changes.
Counterintuitively: program in python only if you can get away without knowing these numbers.
When this starts to matter, python stops being the right tool for the job.
Or keep your Python scaffolding, but push the performance-critical bits down into a C or Rust extension, like numpy, pandas, PyTorch and the rest all do.
But I agree with the spirit of what you wrote - these numbers are interesting but aren’t worth memorizing. Instead, instrument your code in production to see where it’s slow in the real world with real user data (premature optimization is the root of all evil etc), profile your code (with pyspy, it’s the best tool for this if you’re looking for cpu-hogging code), and if you find yourself worrying about how long it takes to add something to a list in Python you really shouldn’t be doing that operation in Python at all.
Exactly. If you're working on an application where these numbers matter, Python is far too high-level a language to actually be able to optimize them.
I doubt there is much to gain from knowing how much memory an empty string takes. The article or the listed numbers have a weird fixation on memory usage numbers and concrete time measurements. What is way more important to "every programmer" is time and space complexity, in order to avoid designing unnecessarily slow or memory hungry programs. Under the assumption of using Python, what is the use of knowing that your int takes 28 bytes? In the end you will have to determine, whether the program you wrote meats the performance criteria you have and if it does not, then you need a smarter algorithm or way of dealing with data. It helps very little to know that your 2d-array of 1000x1000 bools is so and so big. What helps is knowing, whether it is too much and maybe you should switch to using a large integer and a bitboard approach. Or switch language.
> Under the assumption of using Python, what is the use of knowing that your int takes 28 bytes?
Relevant if your problem demands instatiation of a large number of objects. This reminds me of a post where Eric Raymond discusses the problems he faced while trying to use Reposurgeon to migrate GCC. See http://esr.ibiblio.org/?p=8161
It's important to know that these numbers will vary based on what you're measuring, your hardware architecture, and how your particular Python binary was built.
For example, my M4 Max running Python 3.14.2 from Homebrew (built, not poured) takes 19.73MB of RAM to launch the REPL (running `python3` at a prompt).
The same Python version launched on the same system with a single invocation for `time.sleep()`[1] takes 11.70MB.
My Intel Mac running Python 3.14.2 from Homebrew (poured) takes 37.22MB of RAM to launch the REPL and 9.48MB for `time.sleep`.
My number for "how much memory it's using" comes from running `ps auxw | grep python`, taking the value of the resident set size (RSS column), and dividing by 1,024.
1: python3 -c 'from time import sleep; sleep(100)'
The titles are oddly worded. For example -
It seems to suggests an iteration for x in mylist is 200x slower than for x in myset. It’s the membership test that is much slower. Not the iteration.Also the overall title “Python Numbers Every Programmer Should Know” starts with 20 numbers that are merely interesting.
int is larger than float, but list of floats is larger than list of ints
Then again, if you're worried about any of the numbers in this article maybe you shouldn't be using Python at all. I joke, but please do at least use Numba or Numpy so you aren't paying huge overheads for making an object of every little datum.
Initially I thought how efficient strings are... but then I understood how inefficient arithmetic is. Interesting comparison but exact speed and IO depend on a lot of things, and unlikely one uses Mac mini in production so these numbers definitely aren't representative.
I doubt list and string concatenation operate in constant time, or else they affect another benchmark. E.g., you can concatenate two lists in the same time, regardless of their size, but at the cost of slower access to the second one (or both).
More contentiously: don't fret too much over performance in Python. It's a slow language (except for some external libraries, but that's not the point of the OP).
String concatenation is mentioned twice on that page, with the same time given. The first time it has a parenthetical "(small)", the second time doesn't have it. I expect you were looking at the second one when you typed that as I would agree that you can't just label it as a constant time, but they do seem to have meant concatenating "small" strings, where the overhead of Python's object construction would dominate the cost of the construction of the combined string.
Nice numbers and it's always worth to know an order of magnitude. But these charts are far away from what "every programmer should know".
I think we can safely steelman the claim to "every Python programmer should know", and even from there, every "serious" Python programmer, writing Python professionally for some "important" reason, not just everyone who picks up Python for some scripting task. Obviously there's not much reason for a C# programmer to go try to memorize all these numbers.
Though IMHO it suffices just to know that "Python is 40-50x slower than C and is bad at using multiple CPUs" is not just some sort of anti-Python propaganda from haters, but a fairly reasonable engineering estimate. If you know that you don't really need that chart. If your task can tolerate that sort of performance, you're fine; if not, figure out early how you are going to solve that problem, be it through the several ways of binding faster code to Python, using PyPy, or by not using Python in the first place, whatever is appropriate for your use case.
Python programmers don't need to know 85 different obscure performance numbers. Better to really understand ~7 general system performance numbers.
Great reference overall, but some of these will diverge in practice: 141 bytes for a 100 char string won’t hold for non-ASCII strings for example, and will change if/when the object header overhead changes.
Yeah... No. I've 10+ years of python under my belt and I might have had need for this kind of micro optimizations in like 2 times most