Profile picture
Max Liani @maxliani
, 23 tweets, 4 min read Read on Twitter
Implement first, optimize later they say... as years go by I am growing convinced that is actually a rather poor development practice.
The task of optimization is one of the most complex tasks a developer is given to perform. I’d argue that optimizing software is much harder than debugging it, which in turn it tends to be harder than writing the solution to begin with.
I’ll start with an analogy (perhaps a poor one). A chef working at a restaurant, cooking meals a la carte. For expediency they pretend pans and pots can be cleaned later, the first thing to do is to cook! The customers are waiting after all. The bench becomes a mess quickly...
And sooner than you realize they spend more time pushing dirty pans and spoons than focusing on cooking the best meal. At the end of the day they have a humongous pile of dirty stuff to take care of, much more than they can manage.
So they’ll clean some, removing the biggest incrustations of stuff, but the pots and dishes are still dirty, far from clean... at some point things starts to fall apart in some non obvious ways, customers complains that the food don’t test good, some complains of allergies...
...from ingredients that were not in the recipe of the food they have ordered.
Back to software, in particular with complex one, with public APIs. Optimize later force you in a corner where you did not realize early enough. Now you have API boundaries and the changes you need to do in order to fix the performance issues are difficult or impossible.
Your software becomes inadequate sooner than you wanted and your users will complain if you change things in between versions.
Some less obvious problem is how performance issues tends to compound in larger issues. One of them is not very obvious, but many of them makes everything slow, in particular when the issue is not instruction related, but memory bandwidth related.
A subsystem is not making good use of memory reads and writes, it seems fine in its own, but it makes another system suffer. If you simplify the user case you loose the symptoms, so you are left to struggle with huge data sets to troubleshoot.
Then there is the “scheduling problem”. The next task will be a super duper high priority feature request, followed by another, yet another. Your first well understood performance TODOs becomes cluttered by more development, by other people refactoring, making it a challenge understand what was once clear and manageable.
People say, don’t do premature optimizatio... well I say, don’t spend time on instruction-level micro-optimizatio. But when it comes to the architecture and memory access, it is never too soon to make the right steps, and any misstep downhill will be a long effort to recover.
People is asking me what are my considerations in designing high performance systems:
1. avoid any computation unless strictly necessary, computation is “bad”.
2. consider data preconditions to avoid computation.
3. data is also “bad”, use/build up as little as possible.
4. consider the incidence of the computation. Does it happen once a day, once per key press? Once for each record/object in your data set?
5. how big is the data set? Any hard limit? Can you impose any artificial hard limit to prevent undesirable scaling effects?

If your working set is always tiny, no need for careful design, nor optimizations of most sorts... hey but does it leave address space fragmentation??
6. if the working set is user defined (I.e. a 3D scene), assume the worse. If you have 2^32 addressable entries in the scene you can count the user will try to fill it (and likely ask for more). Will the system scale to that? Or will it fall apart at a tiny fraction of capacity?
7. everything runs fast in a small data set. If this is a new feature you have, profiling may not help much if you don’t have a data set to stress test it. Programmatically generates data sets are only modrately useful, and real data is difficult to come by until after you ship.
8. estimate compute to memory bandwidth ratio, even vaguely. If you read more data than you compute your system will run at a fraction of capacity.
9. design data structures to maximize memory bandwidth, every other design decision is secondary to this. So as soon as you understand how to compute something, chuck all code away and start again with the data structure first.
10. don’t be afraid to start over, you are likely to get it wrong a few times before you have it right. At every new integration you gain insight of how to make something. Experience will guide you.
11. you may not have a problem that can be solved with a predictable memory access pattern. In this case try to limit the damage and coalesce data as much as possible.
12. complexity grows when trying to maximize speed while also minimize memory footprint for a variety of user cases. Try to stop before the system you design is prohibitively expensive to maintain.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Max Liani
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!