Post

@TheSun

@TheSun

@ecolenormalesup

More from @BlackHC

Andreas Kirsch 🇺🇦

@BlackHC

Jun 9

https://x.com/MFarajtabar/status/1930707591648493730

I'm late to review the "Illusion of Thinking" paper, so let me collect some of the best threads by and critical takes by @scaling01 in one place and sprinkle some of my own thoughts in as well.

The paper is rather critical of reasoning LLMs (LRMs):

https://x.com/MFarajtabar/status/1930707591648493730

https://x.com/MFarajtabar/status/1930707624032653487

The paper explores four puzzle environments: Tower of Hanoi, Checkers Jumping, River Crossing, and Blocks World.

It finds some "surprising" behavior of LRMs: they can perform 100 correct steps on the Tower of Hanoi, but only 4 steps on River Crossing.

https://x.com/MFarajtabar/status/1930707624032653487

https://x.com/scaling01/status/1931854370716426246

Somehow the authors were not aware or did not reflect on the actual complexity of the games. As @scaling01 points out via o3, River Crossing is actually harder to solve because it has a large branching factor and high chance of ending up in dead ends

https://x.com/scaling01/status/1931854370716426246

Read 25 tweets

Andreas Kirsch 🇺🇦

@BlackHC

Dec 4, 2021

Finally an information-theoretic deduction of Stirling's approximation for Binomial Coefficients 🥳🎉

We present a very different take on how to derive it. We only use basic probability theory and intuitions from information theory 🔥

blackhc.net/blog/2021/bino…

We show that the approximation is actually an upper bound and characterize the approximation error

We compare the approximation to the exact binomial coefficients and see that the approximation error is negligible and our simple estimate of the approximation error is surprisingly accurate

Read 4 tweets

Andreas Kirsch 🇺🇦

@BlackHC

Jun 20, 2020

@UniofOxford

Has anyone called out @UniofOxford and its colleges for how they bully and prevent students from returning to their accommodation and the atrocious advice they have given? @KelloggOx @StJohnsOx @LinacreCollege @ChCh_Oxford come to mind🙂

Students living in university or college accommodation at Oxford don't have tenancy agreements but "license" agreements, which give colleges lots of leeway and students practically no rights

@KelloggOx

Colleges have been preventing students from going back to their rooms. In some cases, students only get a 2-hour slot to get their stuff out now, and in others, colleges have refused to return deposits in a timely manner. Looking at you @KelloggOx

Read 17 tweets

Andreas Kirsch 🇺🇦

@BlackHC

Apr 20, 2020

@yaringal

🎉🎉Happy & proud to share some research into Information Bottlenecks from @yaringal, @clarelyle and me at @OATML_Oxford 🎉🎉

We provide intuition and practical IB objectives for modern DNN architectures, like ResNets.

Check it out on arXiv
👉arxiv.org/abs/2003.12537

Our paper "Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning" shows that well-known dropout regularization with standard cross-entropy loss and simple regularizers optimizes IB objectives in modern DNN architectures.

We use Information Diagrams to provide grounded intuitions and review existing variants of the IB objective.

From this, we rearrange the objective to focus on what we call the Decoder Uncertainty H[Y|Z] as loss term and Reverse Decoder Uncertainty H[Z|Y] as regularization term.

Read 4 tweets

Andreas Kirsch 🇺🇦

@BlackHC

Apr 9, 2020

🔥 Has your PyTorch code ever crashed because it ran out-of-memory in CUDA, and you had to fiddle with batch sizes repeatedly? 🔥

What if we could just write code that adapted to the available memory instead of resorting to brittle hand-tuning? 🤯

👉 github.com/BlackHC/toma🤗

toma (torch memory-adaptive algorithms) helps you write algorithms in PyTorch that adapt to the available (CUDA) memory.

It's hands-off and does not make assumptions about your code.

If your code fails, it halves the batch size and retries.

It tracks how much memory was available when you ran your code and remembers the batch size that worked.

Next time the code is called and that amount of memory is available, it will start with the last successful batch size to avoid wasting compute.

Simple but useful🙃

Read 4 tweets

Andreas Kirsch 🇺🇦

@BlackHC

Mar 11, 2020

https://twitter.com/PeterKolchinsky/status/1237581727867924488

Please read this very detailed yet plain to understand analysis:

https://twitter.com/PeterKolchinsky/status/1237581727867924488

I'll wear my conspiracy hat for a second: given all we know and all that must have been known by stakeholders earlier, is the current inaction towards stricter containment gross negligence due to stupidity and recklessness by our leaders or are some condoning the consequences:

More old people are gonna die and people with preconditions. People who are more vulnerable but also put more pressure on our health care and welfare systems than young and healthy working-age folks. Is this happening at the moment?

Read 5 tweets

Share this page!

Enter URL or ID to Unroll

Andreas Kirsch 🇺🇦

Try unrolling a thread yourself!

More from @BlackHC

Andreas Kirsch 🇺🇦

Andreas Kirsch 🇺🇦

Andreas Kirsch 🇺🇦

Andreas Kirsch 🇺🇦

Andreas Kirsch 🇺🇦

Andreas Kirsch 🇺🇦

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!