davidad 🎇 Profile picture
Aug 13, 2020 17 tweets 4 min read Read on X
In the words of Donald Knuth (on this very topic), "But no, no, ten thousand times no!" (source: math.uni-bielefeld.de/~sillke/NEWS/0…)

Let's talk about how primitive functions on ℝ even get defined. I'm using Terry Tao's Analysis I as a reference. 1/
ℝ is constructed step-by-step:
1. Obtain the naturals ℕ somehow. Popular choices include (a) taking Peano axioms as fundamental, (b) using Zermelo-Frankel axioms to construct finite ordinals, (c) defining an inductive type, or (d) the classic, accepting ℕ from God. (I kid.) 2/
2. Construct ℤ as a quotient of ℕ×ℕ: specifically, pairs (a,b) where a∈ℕ and b∈ℕ, interpreted as purely formal expressions "a – b", with "a – b" and "c – d" considered equal iff a+d = c+b (where the + here is Peano addition, of type ℕ×ℕ→ℕ). 3/
3. Construct ℚ as a subquotient of ℤ×ℤ. This is very similar to step 2, where we "closed over" subtraction (which is not always defined on ℕ, e.g. 2-5∉ℕ), but now we close over division. We declare that "a / b" and "c / d" are equal iff a⋅d = c⋅b. 4/
(The only new hitch with division that wasn't a problem with subtraction is that if we allow all such formal divisions, every "rational" would equal "0 / 0" since 0⋅d = 0 = c⋅0. So we restrict to the subset {(a,b) | a∈ℤ ∧ b∈ℤ ∧ b≠0} before taking the quotient.) 5/
4. Finally ℝ is defined as the Cauchy completion of ℚ. This is quite a bit more complicated than closing over " – " or " / ", but it's really the same flavor of construction: we're closing over formal limits of Cauchy sequences, considering them elements of the new field ℝ. 6/
Now we have ℝ; how do we define operations of type ℝ×ℝ→ℝ? We recapitulate the construction of ℝ:
1. First we define the operation on ℕ.
2. Then we extend to ℤ and show it respects equivalence.
3. Similarly, extend to ℚ.
4. Finally, extend to formal limits, hence ℝ. 7/
Defining primitive operations on ℕ from scratch is already a little challenging: ℕ is infinite, so we can't just write down a function table. The only way is with an induction argument, or a recursive function (which are the same thing, via the Curry-Howard correspondence). 8/
In the case at hand, x^y, we start with x∈ℕ and y∈ℕ, proceeding by induction on y:
Base case. If y=0, x^y is defined as the multiplicative unit, 1.
Induction case. If y=succ(n), x^y is defined as x⋅(x^n).
(You might notice 0^0 is already determinate at this early stage!) 9/
Because the above definition makes use of x only as a multiplicand, and using a separate development of multiplication as a primitive (following the same pattern, ℕ⇒ℤ⇒ℚ⇒ℝ), that exact same definition is still valid for x∈ℝ and y∈ℕ. Next we can generalize y. 10/
If y∈ℤ then either y equals a natural “n - 0” (in which case we use the previous definition) or y is the negative of a natural “0 - n”, in which case if x≠0 we can declare x^y as the multiplicative inverse of x^n. 11/
For y∈ℚ, we use the auxiliary notion of an nth root (for n∈ℕ). When x≥0 the set {z∈ℝ | z≥0 ∧ z^n≤x} is nonempty and bounded above. So the supremum of it exists, and we use this to define x^y when y = “1 / n”. When y = “k / n”, x^y:=(x^k)^(1/n), and this respects =. 12/
Finally, for yᵢ∈ℚ, if x∈ℝ and x>0, it’s a theorem that if (yᵢ) is a Cauchy sequence, then (x^yᵢ) is also Cauchy, and its limit respects equivalence of real numbers, so we can extend to y∈ℝ if x>0. 13/
To sum up, we have a well-defined value for x^y in the following cases:
1. x∈ℝ and y∈ℕ
2. x∈ℝ, x≠0, and y∈ℤ
3. x∈ℝ, x≥0, and y∈ℚ
4. x∈ℝ, x>0, and y∈ℝ
Each new case is a conservative extension—where the domains of definition overlap, the defined values agree. 14/
But the domains of definition are not strictly increasing. Indeed, 0^0 is not handled by case 2, nor case 4, but it is handled by case 1 & case 3. Confusion arises if we imagine that the limit-based case 4 is the final, most general definition because both x and y are in ℝ. 15/
Exponentiation at its core is a model of repeated multiplication, with its properties like x^(a+b) = x^a⋅x^b. Coherent extensions let us generalize this to expressions like e^π. But if we end up *undefining* expressions defined in the simplest case, something went wrong! 16/16
(correction: case 3 also requires y≥0.)
1. x∈ℝ and y∈ℕ
2. x∈ℝ, x≠0, and y∈ℤ
3. x∈ℝ, x≥0, y≥0, y∈ℚ
4. x∈ℝ, x>0, and y∈ℝ

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with davidad 🎇

davidad 🎇 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @davidad

Nov 13, 2023
Deep neural networks, as you probably know, are sandwiches of linear regressions with elementwise nonlinearities between each layer.
The core contribution of “Attention is All You Need,” which led directly to the LLM/GPT explosion,
is to throw some *logistic* regressions in there
Credit is also due to @geoffreyhinton for dropout, @ChrSzegedy for activation normalization, and @dpkingma for gradient normalization (Adam). The rest is commentary
@geoffreyhinton @ChrSzegedy @dpkingma @ylecun is commonly credited with the initial stacked-linear-regression idea (and using gradient descent to handle the learning), and the logistic regression layer was distilled from Bengio’s bag of tricks (which also includes much of the commentary).
Read 7 tweets
Aug 4, 2023
with GPT-4 code interpreter, it finally became worthwhile for me to run the numbers myself on that lead-poisoning theory—that the 1971-2012 technological stagnation is a function of environmental cognitive impairment of the grad student and postdoc population—and uh: Image
be careful with that lead-apatite out there folks
@BenjaminDEKR quantum Hall effect, HTML, email, Web, search, LED displays, smartphone form factor… not nothing, but all kind of underwhelmingly derivative by comparison, no? anyway the 1971 date is due to @tylercowen. not sure if he’d agree that it ended in 2012, right after he pointed it out
Read 11 tweets
Jun 29, 2023
A thread about formal verification of LLMs.

I often find myself disclaiming that I do *not* propose to formally verify question-answering or assistant-type models, because I don’t think the specifications can be written down in a language with formal semantics.

But what if… 🧵
Scott Viteri suggested I consider the premise that LLMs “know what we mean” if we express specifications in natural language. I’m not convinced this premise is true, but if it is, we can go somewhere pretty interesting with it. 1/
Imagine taking two instances of the LLM and stitching them together into a cascade, where the 2nd copy checks whether a trajectory/transcript satisfies certain natural-language spec(s), and ultimately concludes its answer with YES or NO. (This is not unlike step 1 of RLAIF.) 2/
Read 14 tweets
Jun 20, 2023
2020s Earth has an acutely unprecedented concentration of technological “dry powder”: existing machines & infrastructure, controlled by easily reprogrammable devices.

This broadly offense-dominant technology base is a critical factor in the extinction risk posed by AI. 🧵
If GPT-4’s Azure datacenter were plonked in 1820s Earth, it wouldn’t do much. After a few hours, the uninterruptible power supplies and other backup power sources would drain, and it *really* wouldn’t do much. The same is true of GPT-n for any n. Intelligence ⇏ causal power!
Suppose you bring GPT-99 to 1823 along with a self-contained nuclear power station. And suppose for the sake of argument that it’s prompted to design a successor AI that causes as much total damage to human life as possible (a prompt which surely no human would ever give, right?)
Read 10 tweets
May 8, 2023
I’m with LeCun on this one, actually.

What this argument misses is that it’s not (currently!) scalable to build a world-model that can ground legal entities in physical dynamics sufficiently detailed as to facilitate enforcement,
nor to verifiably plan within such a rich model.
But I have substantial hope about making this work:

lesswrong.com/posts/jRf4WENQ…
As a matter of praxis, Yoshua Bengio suggests that the AI R&D community focus mostly on the scientific modeling AI and not deploy any autonomous agents until they can be proven safe to a high standard, which seems very sensible to me.

yoshuabengio.org/2023/05/07/ai-…
Read 5 tweets
Dec 4, 2022
@RatOrthodox tired: let's think step-by-step
wired: write a Seinfeld screenplay in which the gang reach the mathematically correct answer
@RatOrthodox (the exact prompt was:)
@RatOrthodox I snuck in the word "mathematically", which is an adequate substitute for "probability theory".
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(