Yohan Profile picture
Sep 3 13 tweets 4 min read Read on X
Turns out there are some pretty big issues with DHS data.

A new study finds massive subnational differences in data quality across 35 African countries.

Here's the breakdown: Image
A new study in Nature Communications, analyses geocoded DHS data at a 5km resolution.

It highlights serious concerns for health and development policymaking: Image
The researchers focus on three types of data errors:

• Incomplete age (missing birth month or year)
• Age heaping (ages ending in 0 or 5)
• Flagged HAZ (missing or implausible child height data)

These are widely used indicators of data quality.
Using geostatistical models, they mapped these indicators at high spatial resolution.

They then aggregated the results to district and national levels, weighted by population. Image
Findings show extreme within-country variation.

For example:

• Nigeria’s age heaping ranged from 25% to over 60% across districts
• In Chad, missing age data varied from 8% to over 90% between regions Image
A major discovery: data quality deteriorates the further you get from settlements.

In rural and remote areas, missing data, imprecise measurements, and other errors become much more common. Image
This remoteness penalty was found across all three indicators, and across nearly all countries.

It was particularly strong in West Africa, and slightly weaker in Central and Southern Africa.
Another key finding: poor data quality is only weakly correlated with standard sampling uncertainty.

This means two separate problems can overlap in different places:
• Small sample sizes
• Systematic measurement errors

Both threaten the reliability of local data.
For instance, parts of Madagascar and Niger showed both high sampling uncertainty and high systematic errors.

In contrast, some districts in Angola and Senegal had very high data quality. Image
Poor quality data can ultimately mislead policymakers, misallocate resources, and fail to capture the true needs of remote populations.

Without knowing where these problems are, interventions may miss their targets.
The researchers provide an online visualization tool to explore data quality across Africa.

This will help data users assess risks and adjust their analyses accordingly.

Link: apps.worldpop.org/SSA/data_quali…Image
The takeaway:

even “gold standard” household surveys like DHS aren't uniformly reliable.
If you're interested in subnational development data, check this post out:



And give us a follow @yohaniddawela for more breakdowns on geospatial topics.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Yohan

Yohan Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @yohaniddawela

Aug 29
Air pollution is usually blamed for lung and heart disease.

But new clinical data shows it may also drive diabetes.

Here’s what you need to know: Image
The researchers combined:

• Outpatient clinical records from the Italian Association of Diabetologists (AMD)
• Municipality-level pollution exposure data from ISPRA, Italy’s environmental protection agency

This gave them a unique dataset of pollution and diabetes at the local level.Image
The AMD dataset is pretty powerful:

• Covers ~300 diabetes centres across all 20 Italian regions
• Half of all diabetes outpatients in Italy
• Based on clinical records, not self-reported cases

This makes it far more reliable than survey-based data.
Read 13 tweets
Aug 20
Changing your map’s resolution can change your conclusions.

It’s called the Support Effect.

And it distorts everything from poverty estimates to climate models.

Here’s how it works: Image
In spatial analysis, “support” refers to the unit of measurement in space.

It could be:
• a point (e.g., GPS location)
• an area (e.g., census tract)
• a pixel (e.g., satellite image cell)

The support determines how and where a variable is measured.
Here’s the issue:

If you change the size or shape of the support, the results change.

This is the Change of Support Problem (COSP).

It means that statistics like the mean, variance, or correlation can shift. Image
Read 15 tweets
Aug 12
We’ve been measuring HDI at the national level for decades.

But living standards can vary dramatically within a country.

A new dataset finally shows HDI at a much finer scale.

Here’s the breakdown: Image
The first sub-national HDI dataset was actually published in @ScientificData in 2019.

It was put together by @Globaldatalab. Image
@ScientificData @Globaldatalab They put together admin-1 level HDI estimates from 1990-2021.

You can access the data here: globaldatalab.org/shdi/table/Image
Read 10 tweets
Aug 2
Google DeepMind just released one of the most important tools in geospatial data science.

It’s called AlphaEarth Foundations.

I want to break it down for you in intuitive terms: Image
We have petabytes of satellite images.

But it’s still hard to answer questions like:

• What’s in this image?
• How has it changed?
• What kind of crop or forest is this?

AlphaEarth helps answer these questions, even in places with limited data.
AlphaEarth is a foundation model for Earth Observation.

It turns raw satellite data into compact numerical representations, called embeddings. Image
Read 22 tweets
Jul 28
Most countries don't publish official sub-national population data.

Luckily, there are several geospatial population datasets we can use instead.

Here's a list of them (I wish I knew about 5 years ago): Image
1. WorldPop (@WorldPopProject) provides data on:

• population counts
• population density
• population by age and sex

Data is from 2000-2020 and available at 100m or 1km resolution.

Link: worldpop.orgImage
@WorldPopProject WorldPop have recently been extending this out to 2030.

This is still in beta, but you can find the data here:
Read 12 tweets
Jun 19
Meta is known for Facebook, WhatsApp and Instagram.

But did you know they provide a range of free geospatial datasets for researchers?

These include granular measures of household wealth, population, and network access.

Here's what you need to know about it: Image
1. Meta provides granular estimates of household wealth for low and middle income countries.

Read more about it here:
2. They identified 'at-risk' populations during the pandemic:
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(