Latest Twitter Threads by @N8Programs on Thread Reader App

Mar 7 • 18 tweets • 7 min read

Recently, @awnihannun asserted that 'According to benchmarks Qwen3.5 4B is as good as GPT 4o.' This drew controversy: Is the 4B just benchmaxxed? How could a 4B be as good as GPT-4o? I tried to test this scientifically. The answer to the question is likely: yes, in most cases.

To test this, I wanted a set of 'in the wild' prompts that would reflect real world usage and not narrow code/STEM tasks - so I went to WildChat (the classic repo for this), grabbed one of the training parquet files, and chose 1000 random deduped prompts. I then ran these prompts through GPT-4o and Qwen3.5 4B at recommended sampling settings.

Jan 15 • 15 tweets • 4 min read

I trained an AI model to de-slop AI-gen text by running excerpts from Gutenberg stories through gpt-4o-mini 10 times with a prompt to "rewrite but better", and training a model to go [slopped version] -> [original]. This fools @pangramlabs 25% of the time!

Thread below. 👇

The training process is simple, and only takes 12 hourson an M3 Max. A simple LORA is applied to a quantized version of Qwen3-30B-A3B, which is trained to take in slopped stories and return humanlike outputs. I used 1000 training docs for this, for ~2.5M total tokens.

Feb 4, 2025 • 6 tweets • 2 min read

presenting: n8loom, a library for generating trees-of-thought - the kind used in MCTS, variants of majority-voting, etc. - efficiently by splitting the kvcache into fragments at each node, and dynamically concatenating the results together when generating.

(and a kickass experimental GUI for creating this trees).

the library lets you create, in-code, nodes that represent text processed or generated by the LLM, and then extended and 'grow' the loom with either preset text of more dynamic generation.

Jan 28, 2025 • 4 tweets • 2 min read

reading a deepseek paper and stumbled upon a very beautiful formula where they unify SFT and MOST RL TYPES (DPO, PPO, GRPO, etc.) into ONE FORMULA*

*that requires additional reward functions to be defined.

But the fundamental insight - that all these training methods can be framed as gradient ascent on observed logprobs - is beautiful.

(i am math noob, so i won't try to explain this in a ton of depth) - but they make some really cool revelations - like showing how sft is just really simple RL:

Aug 15, 2023 • 9 tweets • 4 min read

Do you want to learn about antialiasing in three.js (MSAA, FXAA, SMAA), and the difference between all the types available to you?

Was talking a bit on discord about AA, and I ended up compiling some advice I have into one concrete thought process, so I'm going to make a thread™ about it.

🧵👇 (Cringe thread emojis) First type of AA - MSAA: the one that's the default in Three.JS when you set {antialias: true} in your WebGLRenderer and has hardware support.

This AA type combats geometric aliasing by supersampling the triangle geometry test on a per-pixel basis. However, it only sometimes works w/ postprocessing, and only in veryyyyyyy select contexts: ie. tonemapping and (sometimes) bloom. If your postprocessing effect relies on a DISCRETE per-pixel representation of the scene (depth, normals, etc.) than MSAA will lead to weird artifacts. Either way, MSAA can only help with geometric aliasing and will not help with any other type (except for alpha map aliasing, which can be addressed w/ alpha-to-coverage).

Apr 13, 2023 • 13 tweets • 3 min read

I've been doing a lot of CPU optimizations for JavaScript (V8, specifically) recently and thought I'd share a thread of some of my favorites. These optimizations primarily apply to operations you are doing millions of times CPU-side.
(look i even have cringe thread emojis:)
🧵👇 First one is to always used typed arrays - they operate as views to large blocks of contiguous memory. JavaScript Arrays have a lot of extra baggage. As a result, typed arrays are 10-30x faster to read from and write to.

Share this page!

Enter URL or ID to Unroll