Yohan Profile picture
Aug 2, 2025 22 tweets 6 min read Read on X
Google DeepMind just released one of the most important tools in geospatial data science.

It’s called AlphaEarth Foundations.

I want to break it down for you in intuitive terms: Image
We have petabytes of satellite images.

But it’s still hard to answer questions like:

• What’s in this image?
• How has it changed?
• What kind of crop or forest is this?

AlphaEarth helps answer these questions, even in places with limited data.
AlphaEarth is a foundation model for Earth Observation.

It turns raw satellite data into compact numerical representations, called embeddings. Image
𝗦𝗼, 𝘄𝗵𝗮𝘁 𝗮𝗿𝗲 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀?

To understand embeddings, it probably helps to work through an example.

Imagine we have the following graph:

You can see that there are just two dimensions to this graph.

A ‘greenness’ dimension and a ‘treeness’ dimension. Image
Now imagine if we take a satellite image and break it down into small grids, like the following.

Let’s focus on one specific grid - i.e. the one I’ve selected in orange below: Image
If we took this orange grid, and situated it on our graph (or ‘embedding space’), it would probably sit somewhere here (since it’s both pretty green, and pretty tree-like): Image
Congratulations, if you’ve followed this basic example, you’ve got the idea of what an embedding is.
Now, imagine if instead of us just having two dimensions, we have dozens.

E.g. it may cover anything from ‘blueness’ to ‘urbanness’ to ‘smokeyness’, etc.

And then, instead of us just having one grid from one satellite image, we situate every grid from billions of satellite images.

Then we'd have a LOT of embeddings.
You can think of embeddings as a way to do two things:

𝟭. 𝗖𝗼𝗺𝗽𝗿𝗲𝘀𝘀 a large amount of information from images into a small set of numbers/coordinates, and

𝟮. 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻 similar parts of an image close together in a multi-dimensional space, so that the model can understand relationships between them.
Here’s a visual example that helps make it more concrete.

It shows how about 1,000 words are represented using embeddings across 200 dimensions.

Here the word “Albania” ends up near other related words like “Albanian”.

The model has learned that they often appear in similar contexts.Image
Once you’ve got good embeddings, you can:

• Detect change over time
• Find places with similar landscapes
• Classify land cover using less training data
• Fill gaps where no data exists

This saves time, resources, and improves accuracy.
What makes AlphaEarth stand out?

• It uses 𝗱𝗶𝘃𝗲𝗿𝘀𝗲 𝗱𝗮𝘁𝗮- satellites, field measurements, and climate records
• It works 𝗮𝗰𝗿𝗼𝘀𝘀 𝘁𝗶𝗺𝗲: not just static images
• It’s 𝗵𝗶𝗴𝗵-𝗿𝗲𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻: 10m²
• It 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗲𝘀 well, even with limited labels
• It’s 𝗰𝗼𝗺𝗽𝗮𝗰𝘁: just 64 bytes per pixel
There are four key innovations:

𝟭. 𝗧𝗶𝗺𝗲-𝗮𝘄𝗮𝗿𝗲 𝗺𝗼𝗱𝗲𝗹𝗹𝗶𝗻𝗴: learns from sparse or irregular data
𝟮. 𝗦𝗽𝗮𝗰𝗲-𝗧𝗶𝗺𝗲 𝗣𝗿𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝗲𝗻𝗰𝗼𝗱𝗶𝗻𝗴: uses attention layers and convolutions
𝟯. 𝗧𝗲𝘅𝘁 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁: links image data with descriptions like “soybean field”
𝟰. 𝗨𝗻𝗶𝗳𝗼𝗿𝗺 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲: avoids redundant outputsImage
𝗛𝗼𝘄 𝘄𝗲𝗹𝗹 𝗱𝗼𝗲𝘀 𝗶𝘁 𝗽𝗲𝗿𝗳𝗼𝗿𝗺?

Across 15 tasks, AlphaEarth outperformed other models.

It did best in:
• Crop type mapping
• Tree species classification
• Evapotranspiration estimation
• Land use change detection Image
Interestingly, AEF seems to be able to estimate biophysical variables that are continuous rather than categorical.

AEF is the only model that meaningfully predicted evapotranspiration, a key variable for farming, water planning, and climate work. Image
You can access AlphaEarth embeddings for free in Google Earth Engine.

No need to host the model or download anything.

Just plug it into your existing GEE workflows. Image
Here’s what you can do with it:

1. Similarity search

Pick any point on Earth, and find all locations with similar environmental conditions.

2. Change detection

Compare embeddings over time to track changes like wildfires or urban growth. Image
3. Clustering

Group areas with similar features, with no labels required.

This is great for identifying forest types, soil regions, or urban patterns.

4. Low-shot classification

Train accurate maps with far fewer labelled points. Image
Google is also offering grants (up to $5,000) to test new use-cases for these embeddings.

You can apply here: docs.google.com/forms/d/e/1FAI…
The takeaway:

AlphaEarth Foundations is a big drop in the geospatial space.

It’s fast, accurate, easy to use, and available now.

Expect even more models like this in future.
If you liked this, you might enjoy this post on foundation models:

And give us a follow @yohaniddawela for more breakdowns on geospatial topics.
Interested in getting a short overview of the latest geospatial papers and datasets each week?

Subscribe to the Spatial Edge newsletter: yohan.soImage

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Yohan

Yohan Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @yohaniddawela

Nov 28, 2025
Mapping an entire country’s agriculture typically requires massive computational resources.

However, researchers have just mapped one country's cropland in 16 hours for only $313.40.

Here's the breakdown: Image
A new preprint from researchers evaluates the utility of geospatial embeddings for cropland mapping in Togo: Image
So, what are these embeddings?

They are derived from Geospatial Foundation Models (GeoFMs).

The researchers focused on two specifically:

• Presto (low compute requirements)
• AlphaEarth (already generated globally)

These models compress complex sensor data into actionable features.
Read 11 tweets
Oct 23, 2025
5 South American countries just discovered they're wasting $15-23 billion on duplicate renewable energy infrastructure.

A new paper examines the solution: Image
A new study in Nature Communications examines Argentina, Brazil, Chile, Paraguay, and Uruguay's electricity systems through 2050: Image
The researchers ran 80 different scenarios testing:

• Full vs limited regional coordination
• 90% emissions cuts vs no climate policy
• Different wind turbine types
• Solar tracking technologies

They used an open-source model called GridPath to optimise both generation and transmission.
Read 12 tweets
Oct 14, 2025
Asia will bear 65% of global mangrove losses by 2100, whilst OECD countries face just 3%.

A new study reveals the massive inequality in climate impacts on coastal ecosystems.

Here's what you need to know: Image
A new study in Environmental Research: Climate presents the first global analysis of how warming seas threaten mangrove restoration efforts: Image
The researchers analysed mangrove cover data across 1,533 locations worldwide from 1996 to 2020.

They examined how climate variables and economic development influence mangrove area.

Their approach uses panel data analysis to isolate the causal effects of temperature and GDP. Image
Read 12 tweets
Sep 9, 2025
Everyone is talking about Zarr.

ESA is adopting it and others are testing it.

Does this mean the end of Cloud Optimized GeoTIFFs?

Here is what you need to know: Image
ESA recently announced Zarr as the new format for Sentinel-1, 2 and 3.

USGS has benchmarked it for Landsat’s archive.

But many in the community are asking: does this mean the end of COG? Image
What are the basics?

• Zarr: best for large, n-dimensional data cubes (e.g. climate models, satellite time series, weather).

• COG: best for 2D rasters like imagery or elevation, especially when you need wide compatibility with existing tools.
Read 13 tweets
Sep 3, 2025
Turns out there are some pretty big issues with DHS data.

A new study finds massive subnational differences in data quality across 35 African countries.

Here's the breakdown: Image
A new study in Nature Communications, analyses geocoded DHS data at a 5km resolution.

It highlights serious concerns for health and development policymaking: Image
The researchers focus on three types of data errors:

• Incomplete age (missing birth month or year)
• Age heaping (ages ending in 0 or 5)
• Flagged HAZ (missing or implausible child height data)

These are widely used indicators of data quality.
Read 13 tweets
Aug 29, 2025
Air pollution is usually blamed for lung and heart disease.

But new clinical data shows it may also drive diabetes.

Here’s what you need to know: Image
The researchers combined:

• Outpatient clinical records from the Italian Association of Diabetologists (AMD)
• Municipality-level pollution exposure data from ISPRA, Italy’s environmental protection agency

This gave them a unique dataset of pollution and diabetes at the local level.Image
The AMD dataset is pretty powerful:

• Covers ~300 diabetes centres across all 20 Italian regions
• Half of all diabetes outpatients in Italy
• Based on clinical records, not self-reported cases

This makes it far more reliable than survey-based data.
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(