Alejandro Piad Morffis Profile picture
Sep 17, 2020 β€’ 21 tweets β€’ 5 min read β€’ Read on X
Hey, guess what, today is #TheoryThursday 🧐!

A silly excuse I just invented to share with you random bits of theory from some dark corner of Computer Science and make it as beginner-friendly as possible πŸ‘‡
Today I want to talk about *Algorithmic Complexity *.

To get started, take a look at the following code. How long do you think it will take to run it?

Let's make that question more precise. How long do you think it will take to run it in the worst-case scenario?
We can see that the code will run slower if:

πŸ‘‰ your computer is older;
πŸ‘‰ the array is longer; or
πŸ‘‰ x happens to be further to back, or not present at all.

Can we turn these insights into an actual formula? We will have to get rid of ambiguous stuff like "old computers".
1️⃣ First, let's consider an abstract computer in which all "atomic" operations take exactly 1 unit of time.

πŸ€” Defining exactly what is an "atomic" operation is far from trivial. For now, assume it's things like arithmetic operations, indexing, invocation.
2️⃣ Second, we'll count the number of operations *with respect to* the size of an arbitrary array.

We will say something like "this will cost 2 units of time for each element of the array".
3️⃣ Finally, we will consider the worst-case scenario.

So we assume, in this example, that the element x is not in the array. More generally, we will always think about the maximum number of operations that could potentially happen.
With these ideas in mind, we are ready to define the *Algorithm Complexity* of this algorithm.

Let's count how many operations are performed in each step, assuming our array has length N:
Depending on how detailed you want to be counting, you could say we have something like πŸ”₯3*N+1πŸ”₯ operations in the ultimate worst-case scenario.

❓ Now, why do we care about this?

The reason is that we can now compare different algorithms.
For example, if your implementation takes πŸ”₯5*N+3πŸ”₯, then it is worse, right?

Well, it's not 😝

Here's the deal: we have been assuming that all "atomic" operations are equally costly, but this is not true...
Hence, it makes no sense to compare my implementation with your implementation by looking at those tiny differences. My 5 * N could be faster than your 3 * N if my "atomic" operations are simpler.

πŸ’‘ We want a complexity measure that smooths away all implementation details.
To achieve this, we will take away everything unimportant when N becomes very large. We will consider that:

πŸ‘‰ N+a and N+b are the same;
πŸ‘‰ a*N and b*N are the same;

... for any finite values a and b.
πŸ”‘ And instead of saying 3*N+4, we will say the asymptotic algorithmic complexity is O(N).

This is called big-O notation.

πŸ’‘ We call this *linear complexity* because the number of operations grows linearly *with respect to* the size of the array.
🧐 Formally, it means that your function's cost is something that is bounded by a linear function.

πŸ’‘ Intuitively, what this means is that in the long run, small differences like specific operations matter less than the capacity your algorithm has to *scale* with more data.
The reason is simple, an algorithm with a lower asymptotic complexity will eventually win. Take for example binary search.

en.wikipedia.org/wiki/Binary_se…

It takes a bit of thinking, but we can prove the asymptotic notation to be O(log N).
Binary search is doing much more work in each iteration than linear search. It could be 20 * log N vs 3 * N. Hence, with very small arrays, linear search could be better.

πŸ”‘ But there is always a value of N after which binary search will win, and in any hardware.
❀️ And that's it. We have just arrived at the intuitive notion of asymptotic complexity!

Calculating it can be daunting for some non-trivial algorithms, but here are some tips for estimating it:
1️⃣ Every nested for loop from beginning to end usually means another exponent.

For example, two nested loops usually mean O(N^2), three nested loops, O(N^3), and four nested loops means you really need to take a break and, afterwards, please refactor that code.
2️⃣ An invocation to function F inside a loop means you have to multiply N times the complexity of F.

For example, if we call binary search for each element of the array, the resulting algorithm is O(N log N)
3️⃣ In recursive methods, if you split at the middle and recurse down only one branch, that's O(log N). If you recurse down both branches, you usually have O(N log N).

These are special cases of a more general rule for recursive methods:

en.wikipedia.org/wiki/Master_th…
Finally, we have just scratched the surface in this thread. Algorithmic complexity is a fascinating topic that touches all fields in Computer Science.

The most important problem in all of CS comes from here, the infamous πŸ”₯P vs NPπŸ”₯.

But that's a story for another Thursday πŸ˜‰.
πŸ—¨οΈ If you like this idea, then get into the conversation! Steal this #TheoryThursday hashtag an tell me about your favourite piece of dark magic.

πŸ”” Just @ me and I'll make sure to weigh in.

πŸ—’οΈ Here is the full thread source:
github.com/apiad/apiad.gi…

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Alejandro Piad Morffis

Alejandro Piad Morffis Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @alepiad

Sep 20
LLMs cannot reason.

Despite their impressive capabilities, all LLMs, including OpenAI o1, are still fundamentally limited by design constraints that make them incapable of true, open-ended reasoning.

Let's break it down. 🧡 (1/5)
Reason 1: Stochastic Sampling.

LLMs rely on probabilities to pick the next token. Even when you fix the temperature, randomness is still built into the language modeling paradigm.

But logic is anything but random.
Reason 2: Bounded Computation.

Each token processed requires constant computation. This means the total computational budget is determined by the input size.

But we know some problems (including logical reasoning) require an exponentially large computation.
Read 6 tweets
Jul 26, 2023
What is Machine Learning?

Here's a 3 min intuitive introduction explaining why this is the most powerful paradigm shift for conventional software development.

No math, no code, just intuitions. Let's dive in.
Conventional software is about automating stuff.

A client comes up with a problem, and to solve it, as a software developer, you must understand very precisely how a human would do it.

Then you can proceed to automate that process.
This is great. Computers are so much faster than humans for so many trivial tasks.

By leveraging that, we can turn a quantitative improvement into a qualitative one, solving in seconds problems that would take humans years.

But there's a catch...
Read 14 tweets
Jul 22, 2022
Clustering is a process for discovering relationships between objects, by placing them in different groups according to how similar they are with each other.

❓Given a set of objects, is there a natural, unbiased way to cluster them?

Meet the ugly duckling theorem.

🧡 1 of 20
Say we have three objects, two swans 🦒🦒 and an ugly duckling πŸ¦†.

Obviously, the natural way to cluster them is by placing the two swans together and the duckling in a different group, right?

Well, it depends on which features you choose to look into.

2 of 20
If we cluster them by colour, sure, but if we cluster them by size, maybe not.

So how about we consider *all the possible* features? Wouldn't that give us the most "natural" clustering?

As a start, let's say there are N boolean predicates that we can evaluate...

3 of 20
Read 20 tweets
Jun 29, 2022
I've spent the last couple of years disrupting traditional software companies with machine learning and data science ideas directly out of my group's core research.

I've found that most issues arise from three critical areas.

Here's what I've learned...

🧡 1/24
Most of the obstacles I've seen can be grouped in one of the following three categories:

1️⃣ the language
2️⃣ the development process
3️⃣ the expected results

Let's tackle them one by one.

2/24
1️⃣ The majority of clashes between academia and industry are due to a language barrier.

We talk about experiments, models, and hypotheses. They talk about functionality, business rules, and user experience.

3/24
Read 24 tweets
Apr 12, 2022
AutoML is a growing subfield of machine learning, that aims to automate some of the most boring and time-consuming parts of designing, training, and deploying a machine learning pipeline.

Here are 10 open source AutoML tools you can start using today:πŸ‘‡
βš™οΈauto-sklearn

Probably the most popular AutoML system, it sits on top of everyone's favourite ML framework, scikit-learn, and gives you a black-box AutoML wrapper that abstracts away most of scikit-learn's estimators.

πŸ”—github.com/automl/auto-sk…
βš™οΈAuto-WEKA

Another well-known AutoML framework based on another popular and well-loved machine learning framework, WEKA. Although the project is not in active development anymore, it is still used by the community.

πŸ”—github.com/automl/autoweka
Read 12 tweets
Oct 21, 2021
Have you heard about P vs NP?

It's probably the most important theoretical question in computer science, and it sounds weirdly abstract.
But deep down, it has a very intuitive explanation.

If you have heard of this and want to learn a bit more, read on...

πŸ§΅πŸ‘‡
Computer Science is all about finding clever ways to solve difficult problems.

We have found clever algorithms for a bunch of them: sorting stuff, finding shortest paths, solving equations, simulating physics...

But some problems seem to be way too hard πŸ‘‡
One example is the Travelling Salesman problem.

❓ Find a cycle starting in your city to visit all major cities in your country and return home with the least fuel cost.

This is the kind of problem we expect computers to solve easily, right? That's what computers are for!
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(