Ben Golub 🇺🇦 Profile picture
Apr 24, 2021 18 tweets 6 min read Read on X
David Blackwell would be turning 102 today.

He's best known for the Blackwell information ordering, the way to formalize when some signals give you more information than other signals.

A thread on Blackwell's lovely theorem and a simple proof you might not have seen.

1/
Blackwell was interested in how a rational decision-maker uses information to make decisions, in a very general sense. Here's a standard formalization of a single-agent decision and an information structure.

2/
One way to formalize that one info structure, φ, dominates another, φ', is that ANY decision-maker, no matter what their actions A and payoffs u, prefers to have the better information structure.

While φ seems clearly better, is it definitely MORE information?

3/
Blackwell found out a way to say that it is. That's what his theorem is about. Most of us, if we learned it, remember some possibly confusing stuff about matrices. This is a distraction: here I discuss a lovely proof due to de Oliveira that distills everything to its essence.

4/
We need a little notation and setup to describe Blackwell's discovery: that the worse info structure is always a *garbling* of a better one.

Let's start by defining some notation for the agent's strategy, which is an instance of a stochastic map -- an idea we'll be using a lot.
Stochastic maps are nice animals. You can take compositions of them and they behave as you would expect.

Here I just formalize the idea that you can naturally extend an 𝛼 to a map defined on all of Δ(X). And that makes it easy to compose it with other stochastic maps.
Okay! That was really all the groundwork we needed.

Now we can define Blackwell's OTHER notion of what it means for φ to dominate φ'.

It's simpler: it just says that if you have φ you can cook up φ' without getting any other information.

7/
Blackwell's theorem is that these two definitions (the "any decision-maker" and the "garbling" one) actually give you the same partial ordering of information structures.

"Everyone likes φ better" is equivalent to "you get φ' from φ by running it through a garbling machine 𝛾."
To state and prove the theorem, we need one more definition, which is the set of all things you can do with an info structure φ.

The set 𝓓(φ) just describes all distributions of behavior you could achieve (conditional on the state ω) by using some strategy.
Now we can state the theorem. We've discussed (1) and (3) already. Point (2) is an important device for linking them, and says that anything you can achieve with the information structure φ', you can achieve with φ.

10/
de Oliveira's insight is that, once you cast things in these terms, the proof is three trivialities and one application of a separation theorem.

So let's dive in!
…oliveira588309899.files.wordpress.com/2020/12/blackw…
Here we show that (1) ⟺ (2).

(1) ⟹ (2). If φ' garbles φ and you HAVE φ, then just do the garbling yourself and get the same distribution.

(2) ⟹ (1). On the other hand, if φ can achieve whatever φ' can, it can achieve "drawing according to φ'(ω)," which makes you the garbling
(2) ⟹ (3) says if 𝓓(φ) contains 𝓓(φ') then you can do at least as well knowing φ': the easiest step.

Note that the agent's payoff depends only on the conditional distribution behavior given the state. Since all distributions in 𝓓(φ') are available w/ φ, agent can't do worse.
(3) ⟹ (2) is the step that's not unwrapping definitions.

Suppose (2) were false: then you could get some distribution 𝐝' with φ' that you can't get with φ. The set 𝓓(φ) of ones you can get with φ is convex and compact, so .... separation theorem! Separate 𝐝' from it.

14/
If we state what "separation" means in symbols, it gives us (*) below. But that tells us exactly how to cook up a utility function so that any distribution in 𝓓(φ), one of those achievable with φ, does worse than our 𝐝'. That's exactly what (3) rules out.

15/
That's it!

Happy birthday David Blackwell, and thanks Henrique de Oliveira. Though I am the world's biggest fan of Markov matrices, there's no need to use them for Blackwell orderings once you know this way of looking at things, which gets at the heart of the matter.

16/16
typo! that red arrow label should just be 𝛾
PS/ Tagging in @smorgasborb, who I didn't know was on Twitter and whose fault this all is.

A few typos above that I hope didn't interfere too much w/ exposition of his argument: In 6, the red label was wrong - fixed here. In 13, the first φ' should be φ.

🙏

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ben Golub 🇺🇦

Ben Golub 🇺🇦 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ben_golub

Apr 27
Small observation on research trajectory in economic theory.

When you first try to do it, most projects you propose are
(i) ridiculous/uninteresting to experts or
(ii) trivial
or, often, both!

This phase is very difficult and it's a big thrill to escape it.

1/
To escape, you have to sense

(i) what directions other scholars consider substantively interesting/promising;
(ii) what experts consider non-obvious.

Judging (i) and (ii) and doing well by these standards is pragmatically what makes young theorists successful
Succeeding at these is a fun, challenging sport.

So fun and challenging that it permanently reshapes how we direct our research.

I think perhaps because failing on these metrics is so painful in youth, once you can do well on them, you really prioritize that.

3/
Read 7 tweets
Jan 28
The notion that amazing papers should not get rejected is an odd one.

Any genuinely important idea is more likely to be strongly disliked. (Some reasons below in a short thread.)

To publish important work, editors have to be bold and overrule some negative experts.
Non-exhaustive list of reasons

1. The first technical work in a new paradigm is often crude and simple relative to the sophisticated and elaborate papers written late in a paradigm, when methods are being polished by a large community of experts in those methods.
2. Good ideas are often counterintuitive and have obvious drawbacks. They often prove valuable mainly through their later consequences.

Experts see the reasons not to pursue the counterintuitive paths: often, they have thought about and rejected these paths themselves before.
Read 7 tweets
Nov 23, 2023
I generally recommend

1. Constructing an n-by-m matrix whose rows are people and columns are issues (or dimensions of issues).

2. Finding the largest few and smallest few singular values.

3. Looking at the corresponding singular vectors in issue space.

(cont.)

1/
The top few singular vectors in issue space will tell you about "bundles" of issues along which there are considerable distances in the group.

(If these have high singular values, that corresponds to those differences explaining a lot of the group's variation in opinions.)
For example, you might find an axis of disagreement between Buttigieg and Warren-inclined democrats, visible across issues.

Buttigieg people would be in favor of things like market-rate housing, while the Warren people would focus on poking Jeff Bezos with a sharp stick.

3/
Read 11 tweets
Nov 23, 2023
Talking to GPT4 about the Sylvester-Gallai Theorem and formalizing it

1/ Image
2/ Image
3/ Image
Read 7 tweets
Oct 22, 2023
Some new progress in math makes me hopeful about finally making progress on a big open question about opinion dynamics in social networks.

The question is: in simple models where people update opinions by averaging in friends' opinions, how long can polarization persist?

1/
In a 2012 QJE paper, Matt Jackson and I

(i) studied "time to consensus" in such learning by adapting the standard EIGENVALUE analysis of convergence times for reversible Markov matrices

(ii) showed how to approximate the answer knowing only "GROUP-level" linking data.

2/


Image
(The second part matters because group-level or "average patterns" linking data is often MUCH, MUCH easier to collect than a complete network census.)



2b/ web.stanford.edu/~arungc/BCMP.p…
Image
Read 13 tweets
Jun 17, 2023
For those who (like me) were interested in the "GPT can ace MIT" paper,

Here's a great short writeup by three MIT EECS seniors explaining the many things wrong with analysis.

dub.sh/gptsucksatmit
A few quick notes.

Chowdhuri, Deshmukh, and Koplow (from now on CDK) point out some things about the methods that would probably be surprising to those who excitedly retweeted the flashiest claims.

First, GPT-4 was often fed the same problems that it was asked to solve. Image
Finding identical problems in the training set is bad news.

But I disagree with CDK that the method would be "all right" otherwise -- priming with "additional context" (template problems and solutions) is very different from what a student is faced with on an exam.
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(