Latest Twitter Threads by @boazbaraktcs on Thread Reader App

Sep 23 • 9 tweets • 3 min read

1/ Our paper on scheming with @apolloaievals is now on arXiv. A 🧵with some of my take aways from it.

2/ Deliberative alignment worked surprisingly well! We saw 30x reduction in rate of covert action from 13% to 0.4% in our test environments.

Note: These test environment are designed to elicit covert action, and so 13% is *not* the baseline in normal production.

Jul 15 • 4 tweets • 1 min read

I didn't want to post on Grok safety since I work at a competitor, but it's not about competition.

I appreciate the scientists and engineers at @xai but the way safety was handled is completely irresponsible. Thread below. I can't believe I'm saying it but "mechahitler" is the smallest problem:

* There is no system card, no information about any safety or dangerous capability evals.
* Unclear if any safety training was done. Model offers advice chemical weapons, drugs, or suicide methods.
* The "companion mode" takes the worst issues we currently have for emotional dependencies and tries to amplify them.

lesswrong.com/posts/dqd54wpE…

Dec 21, 2024 • 5 tweets • 3 min read

1/5 Excited that our paper on "deliberative alignment" came out as part of 12 days of @openai! By teaching reasoning models the text of our specifications, and how to reason about them in context, we obtain significantly better robustness while also reducing over refusals. 🧵

2/5 Traditionally, AI models are just trained with (input, good response, bad response) data, but they are not taught to reason *why* these responses are good or bad. This teaches good "system 1" instincts, but these can fail in new situations. "System 2" allows model to adapt, e.g. when input is encoded.

May 3, 2022 • 5 tweets • 2 min read

1/5 A blog post/book review on history&philosophy of science, reviewing Weinberg's "To Explain The World" and Strevens' "The Knowledge Machine" windowsontheory.org/2022/05/03/phi…

Trigger warning: I compare science to the blockchain, and find positive aspects in the infamous "reviewer 2" 😀 2/5 I found both books fascinating, and recommend reading them. Both focus on roughly the history between Aristotle to Newton, and show that many "simple stories" are more complex than at least I knew before.

Apr 5, 2022 • 6 tweets • 2 min read

1) Jo Boaler charges Oxnord district (100% minority 86.9% economically disadvantaged) $5000 per hour for (dubious, but that's another story) "professional development".

2) Jelani Nelson is outraged, points out he spent 1000s unpaid hours on minority education initiatives.

https://twitter.com/minilek/status/1511358525179453440

3) He tweets Boaler's public contract with a public school district, which is available on their website.

4) Boaler emails him claiming he is "sharing private details" and "spreading misinformation" about her. She tells him that this is "taken up by police and lawyers".

Jan 21, 2022 • 5 tweets • 2 min read

Worth reading. I don't know if "science" vs "principled ML" is the right terminology, but this does touch upon a real phenomenon.

In many areas of computer science (algorithms, crypto), theory is *ahead* of practice. E.g., consider multiparty secure computation, PCPs, etc (🧵)

https://twitter.com/tomgoldsteincs/status/1484609273162309634

They were proposed in 80s and 90s, considered wildly impractical, and only recently began to be implemented and used.

In contrast, in deep learning currently, practice is ahead of theory. Rather than having theoretical proposals that are too complex or inefficient to implement..

Dec 3, 2021 • 14 tweets • 6 min read

1/14 More than 150 scientists & educators signed open letter raising alarm on efforts to water down K-12 math education

scottaaronson.blog/?p=6146

Signers include Fields, Nobel & Turing laurates, and also founders of HS STEM educational initiatives (eg @adrian_mims, @minilek). 2/14 Specifically California proposed changes to its CMF that encourage schools to drop algebra from middle school, and put obstacles on reaching calculus in high school. They also de-emphasize calculus&algebra in favor of shallow "data science" courses.

bit.ly/cmfanalysis

Oct 25, 2021 • 20 tweets • 8 min read

1/20 A 🧵 on public key cryptography, and its interaction with quantum computing. Spurred by a discussion w/ @jfitzsimons, @mattyhoban, @dabacon, @rdviii but more general.

There is a fundamental gulf between public and private key encryption.

2/20 Private key encryption often boils down to simply combining many non-local and non-linear operations. Making it efficient is challenging but if willing to lose some, a Monkey with a typewriter could probably construct a block cipher. See cambridge.org/core/journals/… by @wtgowers

Apr 10, 2021 • 12 tweets • 3 min read

1/12 I recently protested Israel's education minister's decision to rescind Oded Goldreich's Israel prize

https://twitter.com/boazbaraktcs/status/1380149213066096646?s=20

I also signed a letter protesting Jeff Ullman's Turing award

csforinclusion.wordpress.com

In this 🧵, I'll try something crazy - nuance on Twitter 2/12 One of my first exposures to theoretical computer science was through the Hopcroft-Ullman book, and I remain grateful to Ullman for it.

I appreciate Ullman's contributions to the field, and do not want to "cancel" him.

Mar 17, 2021 • 15 tweets • 6 min read

1/14 Yesterday I was asked if there was experiment that changed my mind on right theoretical questions to ask.

One such case is paper w @whybansal & Kaplun arxiv.org/abs/2010.08508
Experiment is this gif. This 🧵 is not about results but how it changed my thinking & open problems

2/14 When I first looked at ML theory, I was focused on "generalization problem" - why over-parameterized networks generalize despite their ability (demonstrated by Zhang et al cacm.acm.org/magazines/2021…) to perfectly fit training data.

Share this page!

Enter URL or ID to Unroll