Robert Graham 𝕏 Profile picture
Feb 16, 2022 17 tweets 5 min read Read on X
So I did a thing.

Back a couple years ago, people were rewriting the classic 'wc' program (word-count) in their favorite programming language to prove theirs could be as fast as C.

So I decided to rewrite using my favorite algorithm instead: a "state machine parser". Image
The algorithm to count words (and lines and characters) is 3 lines long, the while(){} loop at line 25.

You are supposed to marvel at how this is absolutely NOT a word/line/char counting algorithm -- and yet, it produces the same results as 'wc'. Image
I implemented the same algorithm in JavaScript, and it ended up being faster than all those "I rewrote wc in my favorite language" examples. But the reason isn't that JavaScript is faster than their language, but because the ALGORITHM is faster. It also jits well. Image
The above does ASCII, which obviously is trivial. Everybody likes to cherry pick the easy problems that best demonstrate the superiority of their language/algorithm -- that then fails in the real world.

So we let's do UTF8 instead. UTF8 has it's own nasty parsing logic.
I pose this as a challenge to all those "rewrote wc in my favorite language" posts: now do UTF8 and see if your language is any better than C.
In the case of UTF8, my "state-machine parsing" algorithm doesn't change. Instead, the underlying state-machine just gets much bigger (MUCH bigger). I still use the same three lines:
1. get byte
2. do state machine transition
3. do counts Image
So let's test this using a unicode string. I think a fun example would be using the Ogham script from 4th century Ireland that only exists today on around 400 tombstones/monuments, such as this:
᚛ᚋᚐᚊ ᚉᚓᚏᚐᚅᚔ ᚐᚃᚔ ᚐᚈᚆᚓᚉᚓᚈᚐᚔᚋᚔᚅ᚜
Despite not being used since the 4th century, tech today recognizes this. For example, as you can see in Word, there are 4 separate words -- the spell checker has underlined each separate word. Thus, 'wc' should count 4 words. Image
As you can see, the original 'wc' program counts 1 line, 4 words, and 30 characters.

And so does the 'wc2' program using state-machine parsing. Image
So let's benchmark the UTF8 version against the original 'wc' program. I use 4 files:
1. a file containing illegal characters (like a previous edition of PoC||GTFO)
2. a file containing UTF8 sequences
3. ASCII words
4. 18 gigs of only the space character Image
Again, I don't want to cherry pick things, I want to show worst case (files containing illegal stuff) and best case (only space character). The original 'wc' has different speeds for different input, my 'wc2' algorithm has constant speed. Image
Note that while the JavaScript version of my algorithm is slower than the same algorithm in C, it's faster than the classic 'wc' program written in C.

(Note: my JavaScript coding skills are weak, maybe I could write it so it JITs better) Image
Anyway, "state-machine parsers" are a thing. They are something you should be learning in university courses. They can handle a surprising amount of complexity, and are very fast -- pretty much the fastest parsers that don't use SIMD.
In this thread, instead of cherry picking the easiest problem to solve, I went after a particular hard problem to solve. I have to parse UTF8 before I can parse 'wc'.
Anyway, download your latest 'pocorgtfo21.pdf' from your favorite samizdat site today.
MD5 (pocorgtfo21.pdf) = 7f3d3b147a4ba2c099ea3d252ed4a0d7
BTW, there's a little formatting problem. You can MOSTLY copy/paste code directly from the pocorgtfo21.pdf. The quotes have gone awry, and a space was added, so you have to edit it a bit.

But the fact that you can still work with 4th century Ogham script is AMAZING. ImageImage
Damn, the formatting lost a line -- the 'counts[3]' declaration disappeared.

But the original source is contained within the PDF, which you can extract thusly. (Among the oddities of PoC||GTFO is the fact that the PDF is also a ZIP).

Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Robert Graham 𝕏

Robert Graham 𝕏 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ErrataRob

Jul 21
I don't want to get into it, but I don't think Travis is quite right. I mean, the original 25million view tweet is full of fail and you should always assume Tavis is right ....

...but I'm seeing things a little differently.
🧵1/n
2/n
DON'T TRY THIS AT HOME

I'm a professional, so I can take the risk of disagreeing with Tavis. But this is just too dangerous for non-professionals, you'll crash and burn. Even I am not likely to get out of this without some scrapes.
3/n
To be fair, we are all being lazy here. We haven't put the work in to fully reverse engineer this thing. We are just sifting the tea leaves. We aren't looking further than just these few lines of code. Image
Read 14 tweets
Apr 12
Uh, no, by any rational measure, only Trump has had respect for the forum.

Televised debates aren't about "debate" but charisma and media training, where they craft an answer regardless of whether they believe it.

Trump is the only candidate who gives sincere answers.
Trump is pure evil, the brutality of his answers appeals to ignorant brutes who reject all civilized norms.

But the yang to Trump's yin is a liberal elite like Rosen whose comfortable with the civilized norm of lying politicians who play this game of deceitful debates.
To be fair, Biden (and Obama and Bush before him) have stood up for important democratic principles, the ones that Trump flatly reject. But still, the system has gotten crusty. There's no reason to take presidential debates seriously as Rosen does.
Read 4 tweets
Mar 21
I've read through it.

It's the same as all Ben Cotton's analysis's, looking for things he doesn't understand and insisting these are evidence of something bad, that the only explanation is his conspiracy-theory.

I can't explain the anomalies he finds, either, but in my experience as a forensics expert, I know that just because I can't explain it doesn't mean there isn't a simple explanation.

For example, he points to log messages about mismatched versions. I know from experience that such messages are very common, I even see them in software that I write. It's the norm that when you build something from a lot of different software components, that they will not be perfectly synchronized.

That he would make such claims based solely on log messages of mismatched versions proves that he's really not competent -- or at least, very partisan willing to be misrepresent things.
In particular, I disagree with his description of these files. In the C#/.NET environments, creationg of new executables is common. In particular, these are represent web server files. It's quite plausible that as the user reconfigures the website, that these executables will be recreated.

I don't know for certain. I'd have to look at Dominion in more detail. I just know that if any new C#/.NET executables appear in the system that they are not automatically new software.Image
The certification process looks haphazard and sloppy to me, so it's easy for me to believe that uncertified machines were used in elections.

But nothing in Ben Cotton's report suggests to me that this happened. He's not looking for an explanation for the anomalies he finds, he already has an explanation, and is looking for things that the ignorant will believe is proof of that explanation.
Read 4 tweets
Feb 16
This is an incredibly important article and Charlotte Cowles (@charlottecowles) should be praised for writing it. Everybody should read it.


People laughing at her for getting scammed are missing the point, such as what the following picture does. thecut.com/author/charlot…
Image
No, I wouldn't have gotten scammed like her. For one thing, I believe every phone call is a scam, either a criminal one, or some vendor trying to waste my time getting me to pay for things.

But I hate to think what I might fall victim to.
The only real defense is reading articles like the one above. Forget advice about what you should/shouldn't do told to you in a vacuum, instead, read about such stories about what sorts of scams actually happen in the real world.
Read 5 tweets
Jul 5, 2023
🧵1/n
I'm trolled by this thread. So here's my response.

But before that, I want to point out that it's by questions that we come to understand the world. There are no stupid questions. Well, there are, but it's by asking them that we get smarter.

Also, there is a lot of disagreement among economists and bankers about the cause of post-pandemic inflation and what best to do about it.

There is also a lot of disagreement among the podcaster/pundit classes. Most answers to this question come from people regurgitating their favorite podcaster/pundit.
2/n The thing that trolls me is this tweet in that thread. They say "Understood", but I don't understand, because they mention two largely unrelated concepts: short-term inflation and long-term inflation.

It's been know since Roman times that creating money causes long-term inflation. They didn't have the sophisticated understanding we have now, but they did notice that when they debased their coins (reducing gold content, putting more coins in circulation) that the value of the coin went down and consequently, the number of coins need to pay for the same good increased.

Short-term inflation can be caused by a number of things, such as the business cycle overheating, or economic shocks, both of which we've seen post-pandemic.

Such short-term inflation is then followed by short-term deflation, as it needs to bounce back to the long-term rate. For example, in 1932 we saw 10% deflation. This is considered more damaging than inflation, because it causes people to hoard cash under their mattresses, because they know that a year later, it'll be worth 10% more. In other words, deflation causes what's essentially a Ponzi scheme.

Since then, we've largely "tamed" the business cycle. Raising interest rates at the peak prevents short-term inflation, lowering interest rates after the recession prevents short-term deflation. But raising interest rates can trigger recessions, so people

So this tweet below seems to confuse two different concepts, raising interest rates to lower short-term inflation, and the cause of long-term inflation (printing money). By "Understood" I think they mean they've heard of such things, not that they understand such things.
3/n This tweet continues the confusion. The central-bank doesn't raise interest rates to combat long-term inflation (increases in money supply), primarily short-term inflation (overheating, shocks).

With that said, the money supply has increased. The major economies printed money during the pandemic to avoid a collapse of the economy, and that's going to result in long-term inflation.

This is seen in the two graphs below for the UK and the US.

The rough consensus among economists is that three things contribute to the current inflation: this increase in money supply, economic shocks caused by the pandemic, and the post-pandemic pent-up-demand overheating the economy. I say "rough" because I haven't found any good papers proving this. I suspect they don't really know and are just guessing.

Raising interest rates should deal with the two short-term contributors to inflation.

The point is: the person confuses long-term inflation (where historically, interest rate manipulation isn't used to deal with it) and short-term inflation (handled by interest-rate hikes).



Image
Image
Read 8 tweets
Jun 18, 2023
You can't live debate crazy, they will always win.

Live debate is just performance art. Somebody will make some new claim nobody has heard of before, and it'll be impossible to refute without having the time to go research what they just said. "Samuelsson's study from late 2021… twitter.com/i/web/status/1… Image
For example, to prove my point, I opened the podcast (open.spotify.com/episode/3DQfcT…) and skipped forward to a random location, around 37 minutes into the thing (I can't bear to watch all 3 hours and debunk point by point).

At this point, he's talking about a "Lazarus Report" that said… twitter.com/i/web/status/1…
I forget to mention the subtext. The Vice article in question also contains written debunking of some of RFK's claims, and links to other written debunking of other claims.

The premise here is that RFK/Rogan are refusing a written response, and are demanding instead a live… twitter.com/i/web/status/1…
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(