Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Sasha Gusev

Apr 27 • 18 tweets • 8 min read • Read on X

Scrolly

https://twitter.com/timothycbates/status/1916500806196404395

Nice! Here we have an interesting paper using genetic ancestry to classify race/ethnicity in modern data and algorithms. Let's take a look at what this paper found: 🧵

https://twitter.com/timothycbates/status/1916500806196404395

First, I don't want to get too hung up on language, but TCB's tweet starts talking about "ethnicity", then shifts to "continental ancestries", and then entirely omits the largest ethnic group in the US: Hispanics. These terms have distinct definitions (). nap.nationalacademies.org/catalog/26902/…

Anyway, how well can this paper actually impute ethnicity from genetic ancestry in a large cancer population ()? ~17% of the time it gets Hispanic classification completely wrong or a no-call! worldscientific.com/doi/10.1142/97…

But even this is an overstatement, because the majority of participants either didn't list race/ethnicity or provided one that didn't fall into an established category. And the ML algorithm is *terrible* at classifying these unlabeled/partially labeled people as no calls.

This creates an interesting paradox where the algorithm can be made to look like it is more accurate over time, but in reality participants are simply drifting to a new unlabeled space in the social construct.

I was also intrigued by the claim that ethnicity is perhaps the least socially constructed variable in social science because an algorithm can classify some of the labels with some accuracy. Is this really true?

Language is a social construct, but AI is able to do a pretty good job at classifying languages.

Religion is a social construct, but AI can do a pretty good job classifying those too, even from a cartoony illustration.

Race is a social construct, but I bet you could easily classify thes-- wait, where was I going with this?

Even more interesting, you can explain a social construct like "money" to an AI and it will figure out the natural divisions within the construct based on visual details.

Can we do the same for race/ethnicity and ancestry? Let's play a game ...

Here's a basic ancestry plot, where each point is a person. Do the green and purple dots reveal two racial groups?

Nope. The green and purple points are sampled from the same population but the purple dots just came from one family in that population.

Okay but that was simulated data. Here's another one, using real data from a large-scale biobank this time. Are these ten different racial groups? Surely the pink-ish groups are a different race from the greens at least?

Nope! These are all Chinese participants of the Kadoorie biobank, color-coded by the cities they were recruited from. Ancestry inference can be extremely sensitive with enough data.
() pubmed.ncbi.nlm.nih.gov/37601966/

Ok, maybe it's unfair to use such closely related populations. Let's look at data from continental groups and use a model-based clustering approach. Surely the two orange/tan clusters here are different races or continents:

Nope! The two groups being distinguished here are Melanesians and ... the rest of the world. Asian, Middle Eastern, European, and African participants all get clustered together because of the sampling of the data. () pmc.ncbi.nlm.nih.gov/articles/PMC60…

I'm making it too hard. Maybe we need more drifted populations and tree-based clustering instead? Look at the deep divergences across these populations, surely *these* must be different races?

Nope. These are all participants from Native American tribes within a single linguistic group. Some of the most diverged populations in the world get lumped together into one socially constructed box.

() nature.com/articles/natur…

You get the idea.

TLDR: When people say a construct is "the least constructed / best / most replicable in social science", maybe they are telling you more about the quality of the social sciences than the validity of the construct. /x

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @SashaGusevPosts

Sasha Gusev

@SashaGusevPosts

Apr 26

https://twitter.com/SwipeWright/status/1915867631497707589

Always a red flag when people cite a 20yr paper for a question we have much better data on today. On the left is the race/ancestry clustering from Tang et al 2005, and on the right is the race/ancestry clustering for modern biobanks collected over the past few years.

https://twitter.com/SwipeWright/status/1915867631497707589

Of course you can also find similar looking patterns of structure within self-reported race/ethnic groups: (a) white Europeans, (b) white Brits, (2b,c) China, (2f) Canada, (2e) Japan.

Or even in this simulation of a completely homogeneous population that includes some relatives.

gusevlab.org/projects/hsq/#…

Read 5 tweets

Sasha Gusev

@SashaGusevPosts

Mar 28

I wrote about how population stratification in genetic analyses led to a decade of false findings and almost certainly continues to bias emerging results. But we are starting to have statistical tools to sniff it out. A 🧵:

First, stratification = genetic structure + environmental structure. If two populations have some genetic variation (e.g. due to drift) and differing environmental influences on a trait, that will induce a false/non-causal correlation between genes and the trait.

When such false correlations are further aggregated into polygenic scores, they can accumulate into very large *apparent* genetic differences between even closely related populations. And these false differences will mirror the environment: environment looking like genes.

Read 15 tweets

Sasha Gusev

@SashaGusevPosts

Mar 14

https://twitter.com/Noahpinion/status/1900561846307893621

This is a good example of how pointless a lot of the "data oriented" conversations on X are. DataRepublican, a DOGE analyst, makes a bold claim that 0/60,000 sampled government contracts had outlays < potential award ...

https://twitter.com/Noahpinion/status/1900561846307893621

https://x.com/JuddLegum/status/1900228310753230988

Judd Legum, a journalist, points out that having outlays lower than the potential award amount happens frequently, explains why, and highlights a number of specific examples. Seems like a pretty basic error, should be easy to acknowledge right?

https://x.com/JuddLegum/status/1900228310753230988

Wrong. DataRepublican first responds with a bizarre claim that they hadn't sampled enough contracts because "hard drive overheated", but that the methodology is sound. Then notes in passing that there was a bug, but follows it up with a brand *new* analysis.

Read 10 tweets

Sasha Gusev

@SashaGusevPosts

Mar 3

https://twitter.com/SashaGusevPosts/status/1806854731010027966

So it turns out the person running this account and accusing mainstream behavioral geneticists of fraud was actually one of the authors of the discredited Pesta at al. paper that was being criticized. Pretending to be an objective third party so they could sling mud.

https://twitter.com/SashaGusevPosts/status/1806854731010027966

FWIW I don't have a problem with anon accounts and enjoy interacting with many on here. I understand that people may want to partition their on-line/IRL lives. But setting up a sock puppet persona so you can aggro out on colleagues that disagree with you is pathetic.

And using a pseudonym so you can self-cite and email your own preprints to other researchers for them to cite is just sad.

Read 7 tweets

Sasha Gusev

@SashaGusevPosts

Feb 23

https://twitter.com/razibkhan/status/1893177231150649495

It's been interesting seeing Murray become an Ibram X. Kendi figure but for the right. Everyone knows his "analyses" in Human Diversity -- like comparing non-causal allele frequencies between populations -- are completely bogus. Razib knows this too.

https://twitter.com/razibkhan/status/1893177231150649495

But Murray says the things that are politically correct and pleasing to that audience's ego so he regularly gets trotted out for softball interviews and never needs to exhibit any rigor.

This happens over and over. Here's AEI hosting a debate between Murray and Princeton Professor Dalton Conley. Conley explains that the claims made by Murray about genetic differences are unsupported by the data and often gross misinterpretations.

Read 4 tweets

Sasha Gusev

@SashaGusevPosts

Feb 21

https://twitter.com/ESYudkowsky/status/1892374125089341444

This thread and especially the underlying LessWrong post are a good demonstration of the IQ super-baby conspiracy theory that seems to be gripping Silicon Valley. Here's how it works ...

https://twitter.com/ESYudkowsky/status/1892374125089341444

First, claim that we already have the knowledge of how DNA affects college graduation rates but no one is interested in applying it. This is false, we almost never know *which* genetic variant is actually causal nor *how* it actually influences the associated trait.

This is also a challenge the field is very interested in understanding, including large-scale NIH-funded consortia efforts like IGVF (). Claiming that we already have the knowledge also undermines such efforts.genome.gov/Funded-Program…

Read 9 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Sasha Gusev

Try unrolling a thread yourself!

More from @SashaGusevPosts

Sasha Gusev

Sasha Gusev

Sasha Gusev

Sasha Gusev

Sasha Gusev

Sasha Gusev

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!