Sasha Gusev Profile picture
Jan 26 18 tweets 10 min read Read on X
I've written up a "crash course" on population genetics parameters useful for thinking about recent selection, heritability, and group differences (as part of a longer write-up on these concepts).



I'll summarize the key points here 🧵: gusevlab.org/projects/hsq/#…
Image
A preface: if you're generally interested in population genetics it's better to learn from first principles, and I've linked some useful resources to that end (many free). In particular (spoiler) recent evolution excludes some of the more interesting concepts and personalities. Image
But one downside of the general approach is that it can be hard to get a feel for real time (for example when populations are modeled in terms of 4Ne\mu). Here we'll fix three parameters based on data: time (t=65k years), population size (Ne=10k), and selection (s=~10^-4).
Image
Image
We can start by modeling how genetic variants move under neutral drift: very slowly! In 120k years a 5% allele is expected to accumulate just ~1% of drift variance. We can also think in terms of allele "age", and common variants are VERY old (mostly pre-migration).


Image
Image
Image
Image
Now let's add selection. Under the weak coefficients we see in real data, selection acts very slowly. Most common variants under negative selection will stay common. And new variants under positive selection will stay rare. It would take ~300k years for a 95% allele to go to 1%.

Image
Image
Image
These shifts are even slower under stabilizing selection, where traits move towards a fitness optimum instead of directionally up/down. This is likely the way populations have adapted to changing environments (we'll come back to this later).
Image
Image
Now that we have a model for selection and drift, we can test for whether variants are under selection. It turns out this test is very powerful when selection is strong, even 100 samples is enough. Whereas in the "nearly neutral" range it is effectively undetectable. Image
We can quantify genetic variance using F_ST, a fundamental measure of within versus between population correlations and often misunderstood. Part of the confusion is there are two derivations - Nei's and Hudson's - and they can be meaningfully different.


Image
Image
Image
Image
Under strong assumptions, F_ST can be related to population size and migration, but it is compatible with many different population dynamics in a way that can be non-linear and unintuitive.

Image
Image
Image
Moreover, F_ST is highly dependent on *which variants* are used to estimate it, and this can lead to highly unintuitive results. For example the apparently higher F_ST within chimps than between chimps/humans -- an artifact of how sub-populations are tested.

Image
Image
Image
A useful derivation is that Hudson's F_ST bounds the difference in trait mean between populations under neutral drift. We can confirm this in simulations. For a typical ~10% heritable trait, the (African/European) population difference is at most 1.5% (in either direction). Image
That's under neutrality, but under stabilizing selection, things are constrained even further but in complicated ways. After a shift in the fitness optimum, genetic variation is first rapidly selected on, and then gradually (and mostly arbitrarily) purified out of the population. Image
Between populations with the same fitness optimum, the mean trait value will be more constrained than under neutrality. But, it will also look like genetic variation has changed MORE substantially (e.g. F_ST). Interpretation is even more complicated with environmental shifts.
Image
Image
Finally, this brings us to the Breeder's Equation, which connects heritability and the response to selection under a fixed environment. In controlled breeding experiments (e.g. maize) response can be stable for many generations (consistent with polygenicity and new mutations).

Image
Image
Image
But in natural populations, the response often appears static or even negative (aka the "stasis paradox")! I highlight some examples compiled by Walsh & Lynch: bias in heritability estimates, indirect/environmental confounding, or shifts in the environment are all at play.

Image
Image
Image
This "missing response" echoes the debate around "missing heritability", where molecular methods often produce lower estimates and identify environmental confounders. I wonder ... are humans more like maize under controlled breeding or like natural evolving populations? /fin
By the way, all the figures and simulations are pretty simple but I've put the code here in case it's useful: . Let me know if you spot an error.github.com/gusevlab/hsq_a…
@threadreaderapp unroll

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sasha Gusev

Sasha Gusev Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @SashaGusevPosts

Jan 18
We discussed Duffy et al. [] in journal club. Neat approach integrating multiple sources of human genetic evidence to prioritize potential drug targets. Some thoughts 🧵:nature.com/articles/s4158…
The basic idea: approved drugs are enriched for targets with multiple lines of genetic evidence: clinical, rare coding, and common GWAS. Let's put them together. (See also: Sadler, ; Minikel, ; Nelson, ; etc). pubmed.ncbi.nlm.nih.gov/37492104/
medrxiv.org/content/10.110…
pubmed.ncbi.nlm.nih.gov/26121088/
Image
They wrangle a ton of data and train a logistic regression to combine features into a "priority score" for gene-disease target prioritization. Even though the number of drug indications for training was small (just 282) the resulting score is enriched for approved targets! Image
Read 10 tweets
Jan 12
Two new genetic studies of (environmentally confounded) behavioral phenotypes: Income (yes) and Educational Attainment in East Asian populations. What did they find? A 🧵:

[]
[]biorxiv.org/content/10.110…
nature.com/articles/s4156…
As I've argued before () these are valid phenotypes to study but the emphasis needs to be on estimating and interpreting causal parameters, not environmental correlations. That means focusing on within-family results. So let's do that for income first.gusevlab.org/projects/hsq/#…
The income GWAS used three phenotypes (+ parental income in one study), almost all self-reported, across multiple European countries (sample sizes below).

The direct genetic prediction of income (estimated in sibs) was ~1%, so that gives a sense of how little genetics matters.
Image
Image
Read 17 tweets
Jan 4
An academic discussion that really bugs me on here is about "administrative bloat". Which is in stark contrast to most faculty saying that they're drowning in administrative responsibilities. The problem is most people don't know how university revenue and spending works ..
But there's a useful report on trends in college spending that paints a clearer picture and I recommend people engaged in this debate actually read it. A few takeaways below:

files.eric.ed.gov/fulltext/ED568…
Where does the money come from? For research universities (R1's) most revenue does NOT come from tuition, it comes from: gifts/endowments, hospitals/clinical trials, and grants. This is proportionally true in both public and private universities. Image
Read 11 tweets
Dec 23, 2023
I was curious about this idea of using genomic predictions (Enformer) as priors for eQTL discovery. So here's a quick & dirty look at that in the GEUVADIS data from Huang et al. TLDR: slightly better than promoter SNPs, much worse than Predixcan/external QTLs.

Details in thread:
Image
The basic idea is we want to give more a priori weight to variants with high predictions while still controlling the Type I error. Some challenges: (1) Enformer predicts the *causal* var effect, but QTLs test marginal effects (which will aggregate multiple causal vars in LD), ..
(2) Huang et al. showed that Enformer often gets effect direction wrong, so we want to have a prior on the effect variance but not on the mean. (3) We don't have the Enformer attributions and anyway they're not compatible with PrediXcan weights which is our baseline, ..
Read 14 tweets
Dec 19, 2023
A few thoughts on the recent set of papers torture testing genomic deep learning for predicting individual-level gene expression [ , , ]. First a brief summary 🧵:

pubmed.ncbi.nlm.nih.gov/38036790/
pubmed.ncbi.nlm.nih.gov/38036778/
pubmed.ncbi.nlm.nih.gov/38036789/
[Huang et al] evaluated four state of the art models for predicting gene expression in blood (GEUVADIS) using up to 200kb of local genomic sequence around the TSS. Using genes with known eQTLs (which were an eventual point of comparison). Image
Across genes in an individual, the models are great. But across individuals for each gene the correlation is zero. The correlation using PrediXcan (combined eQTLs but not a *causal* model) was ~0.25. These models are not accurately learning the effects of standing variation! Image
Read 15 tweets
Dec 4, 2023
Enjoying re-watching Day 1 of this NHGRI meeting on genetic architecture []. My notes below, with comments in []. genome.gov/event-calendar…
# Session 1:

🎤 Shamil S: Goals for the session are to discuss improving our understanding of:
1. Regulatory function
2. Low dimensional units ("pathways", "systems")
3. Evo bio: why are low fitness phenotypes still common? Image
🎤 Tuuli L:
- Lack of eqtl colocs, but evidence that this is due to insufficient qtl study power
- Integrate experimental and QTL evidence: no silver bullet
- Use gene dose/response to understand systemic effects
- Need convergence at higher level functional layers
Read 19 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(