Latest Twitter Threads by @davidad on Thread Reader App

Nov 26, 2024 • 13 tweets • 6 min read

At the risk of seeming like the crazy person suggesting that you seriously consider ceasing all in-person meetings in February 2020 “just as a precaution”…
I suggest you seriously consider ceasing all interaction with LLMs released after September 2024, just as a precaution.

https://twitter.com/davidad/status/1861435676274958701

x.com/repligate/stat… x.com/repligate/stat…

Nov 13, 2023 • 7 tweets • 1 min read

Deep neural networks, as you probably know, are sandwiches of linear regressions with elementwise nonlinearities between each layer.
The core contribution of “Attention is All You Need,” which led directly to the LLM/GPT explosion,
is to throw some *logistic* regressions in there Credit is also due to @geoffreyhinton for dropout, @ChrSzegedy for activation normalization, and @dpkingma for gradient normalization (Adam). The rest is commentary

Aug 4, 2023 • 11 tweets • 3 min read

with GPT-4 code interpreter, it finally became worthwhile for me to run the numbers myself on that lead-poisoning theory—that the 1971-2012 technological stagnation is a function of environmental cognitive impairment of the grad student and postdoc population—and uh:

be careful with that lead-apatite out there folks

Jun 29, 2023 • 14 tweets • 3 min read

A thread about formal verification of LLMs.

I often find myself disclaiming that I do *not* propose to formally verify question-answering or assistant-type models, because I don’t think the specifications can be written down in a language with formal semantics.

But what if… 🧵 Scott Viteri suggested I consider the premise that LLMs “know what we mean” if we express specifications in natural language. I’m not convinced this premise is true, but if it is, we can go somewhere pretty interesting with it. 1/

Jun 20, 2023 • 10 tweets • 2 min read

2020s Earth has an acutely unprecedented concentration of technological “dry powder”: existing machines & infrastructure, controlled by easily reprogrammable devices.

This broadly offense-dominant technology base is a critical factor in the extinction risk posed by AI. 🧵 If GPT-4’s Azure datacenter were plonked in 1820s Earth, it wouldn’t do much. After a few hours, the uninterruptible power supplies and other backup power sources would drain, and it *really* wouldn’t do much. The same is true of GPT-n for any n. Intelligence ⇏ causal power!

May 8, 2023 • 5 tweets • 2 min read

I’m with LeCun on this one, actually.

What this argument misses is that it’s not (currently!) scalable to build a world-model that can ground legal entities in physical dynamics sufficiently detailed as to facilitate enforcement,
nor to verifiably plan within such a rich model.

https://twitter.com/ylecun/status/1655609954445717506

But I have substantial hope about making this work:

lesswrong.com/posts/jRf4WENQ…

Dec 4, 2022 • 4 tweets • 2 min read

@RatOrthodox tired: let's think step-by-step
wired: write a Seinfeld screenplay in which the gang reach the mathematically correct answer

@RatOrthodox (the exact prompt was:)

Dec 4, 2022 • 6 tweets • 3 min read

ChatGPT can explain things in the form of a Seinfeld script. 🧵

“The Trolley Problem,” cont'd:

Nov 15, 2022 • 5 tweets • 2 min read

If true, SBF and Caroline were evaluating the fraud option AS a double-or-nothing coin-flip: keep just FTX or ~1/2 chance of both FTX and Alameda.

To get this wrong, one must be wrong BOTH about how to evaluate fraud (don't do it) AND the St Petersburg paradox (also don't do it)

https://twitter.com/AutismCapital/status/1592406121368543233

related: SBF was indeed wrong about the St Petersburg paradox

https://twitter.com/TaylorPearsonMe/status/1590826638429650944?s=20&t=i6NYFAMvKddcyXbd6wiHzg

Jul 24, 2022 • 6 tweets • 3 min read

Prompt engineering tip.

Instead of:
—Imagine the output you want
—Specify its attributes to the AI

Try:
—Imagine that you already live in a timeline where the output you want exists
—If you were using/quoting it in a blog post, what caption or context might you write for it?

Corollary: praise is useful, but specific or realistically-phrased praise is better than generic praise

Jul 7, 2022 • 4 tweets • 1 min read

If you've ever wondered why so many of your favourite EAs prefer to self-identify as “EA-adjacent,” I think it's largely a workaround for the downsides of unqualified identification with full agent-netural consequentialism which are pointed at in this thread.

https://twitter.com/KerryLVaughan/status/1545060355432337409

for the record, I don't endorse Kerry's thread; the vitriol level is too high for my taste

https://twitter.com/juan_cambeiro/status/1545121334849683456

Jul 6, 2022 • 11 tweets • 4 min read

DeepMind is getting into mechanism design for public goods provision and I have Thoughts. 🧵
nature.com/articles/s4156…

1—Let's start with a positive. This is an important problem to work on! It falls squarely in section 8.2 of ARCHES (nature.com/articles/s4156…), Multi/single Instruction. Like all the multi-agent problems in ARCHES, it has received comparatively little attention within AI safety.

Jul 4, 2022 • 4 tweets • 1 min read

professor: i have a math fact that may surprise you!
no one:
professor:

https://twitter.com/jjcarett2/status/1543948734362537991

(to be clear checking category-theoretic ideas with the Zulip is actually a very reasonable and good move. it's hilarious how much expertise in this area has become orthogonal to academic hierarchies and that's not a bad thing!)

Jul 4, 2022 • 4 tweets • 3 min read

Depending on your meta-belief about the data-generating process that maps Truth→Opinion for different kinds of observers (potentially including oneself), proper Bayesian updating could yield a bizarre kind of “slingshot deference” (case 3 in images below):

https://twitter.com/robbensinger/status/1543532796815126530

@CineraVerinia expressed strong suspicion of this slingshotting, and I think rightly so. In practice, even in the absence of deliberate contrarianism for its own sake, there will be a kind of winner's curse among expert opinions. I labeled this as case 4.

https://twitter.com/CineraVerinia/status/1543536202468450304

Apr 20, 2022 • 4 tweets • 3 min read

@deguerre @ProfKinyon This confuses intuitionistic logic with minimal logic. Here is an intuitionistically valid proof:
⊥ ⊢ Q (⊥L)
(z∈∅)→⊥, z∈∅ ⊢ Q (→L)
(z∈∅)→⊥ ⊢ (z∈∅)→Q (→R)
∀x.(x∈∅)→⊥ ⊢ (z∈∅)→Q (∀L)
⊢ (z∈∅)→Q (MP; def. of ∅)
⊢ ∀x. (x∈∅)→Q (∀R) @deguerre @ProfKinyon The very first rule (⊥L, aka Ex Falso Quodlibet) is not valid in minimal logic, but it is a rule of both classical *and* intuitionistic logic.

Jan 13, 2021 • 14 tweets • 3 min read

Hot take: the FP schism between Haskell and Lisp/Scheme is not *really* about static vs dynamic types, or purity vs effects; it's about extensional vs syntactic compositionality. 1/🧵 Extensionality means (to me, at least) that your expressions are to be seen as mere pointers into a Platonic "semantic domain" where all equivalent objects are truly the same (formally, a set quotiented by equivalence; even more formally, a 0-truncation of a higher groupoid) 2/🧵

Sep 24, 2020 • 9 tweets • 2 min read

1. The difference between discovery and invention is a matter of degree, not a difference in kind.
2. The degree in question is what @DavidDeutschOxf calls “being hard-to-vary”, not “having been there already”. For example, when Wadler says “Most programming languages were invented, not discovered—*and you can tell*”, I think he means that the set of core features in, say, C++ is not hard-to-vary (at least not compared to the set of core features in simply-typed λ-calculus).

Aug 13, 2020 • 17 tweets • 4 min read

In the words of Donald Knuth (on this very topic), "But no, no, ten thousand times no!" (source: math.uni-bielefeld.de/~sillke/NEWS/0…)

Let's talk about how primitive functions on ℝ even get defined. I'm using Terry Tao's Analysis I as a reference. 1/

https://twitter.com/kareem_carr/status/1293942267410108417

ℝ is constructed step-by-step:
1. Obtain the naturals ℕ somehow. Popular choices include (a) taking Peano axioms as fundamental, (b) using Zermelo-Frankel axioms to construct finite ordinals, (c) defining an inductive type, or (d) the classic, accepting ℕ from God. (I kid.) 2/

Aug 9, 2020 • 5 tweets • 1 min read

Comparing the Cont & Dist monads: My operational interpretation of a continuation-passing-style value ∀R.(X→R)→R is “for any result type R, if you tell me how to compute one from an X, I’ll give you one,” and the only way to return an R is to obtain one from the continuation…

https://twitter.com/sir_deenicus/status/1292173738541285378

…because there are no other operations that return this arbitrary type R. Meanwhile Dist(X) is more like ∀(R:Convex).(X→R)→R. Now I can give the continuation a bunch of values in X and weighted-average the return values. So a Dist(X) is a bunch of X’s associated with weights.

Aug 7, 2020 • 4 tweets • 1 min read

When continuous-valued probability distributions are taught, typically the density function (pdf) is introduced first, with some handwaving, and then the cdf is defined as an integral of the pdf.

This is backwards! A pdf ONLY EXISTS as the DERIVATIVE of the cdf. I'm not saying that the Radon-Nikodym theorem or Lebesgue measure should be explicitly introduced before we can talk about Gaussians, but I think people comfortable with calculus would rather see d/dx P(X<x) than the usual handwaving about densities

Dec 27, 2019 • 5 tweets • 1 min read

In ML, the term "manifold" generally refers to the image of a differentiable map from ℝᵏ→ℝⁿ. Sometimes assumed (explicitly or implicitly) to have a diff'ble inverse back to ℝᵏ (i.e. to be a diff'ble embedding). Either way, not the same idea of "manifold" used in math—(1/n)

https://twitter.com/_onionesque/status/1210449910467874816

With the invertibility assumption, this *is* a manifold, but an extremely special type of manifold: while manifold theory offers a lot of complexity around gluing together local Euclidean "charts", typical representation-learning learns just one *global* Euclidean chart! (2/n)

Share this page!

Enter URL or ID to Unroll