Read on Twitter

12,399 views

@morganherlocker

, 13 tweets, 3 min read Read on Twitter

"Unique in the Crowd", the famous paper that re-id'd 95% of individuals in a mobility dataset using just four GPS points, did so with half the temporal resolution and half the spatial resolution I am seeing in most city's formal anonymization policies for handling mobility data.

The same paper also found that "the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution.", in other words, these policies should expect even worse re-id protection than the oft quoted figures. nature.com/articles/srep0…

If your city's mobility anonymization policy is something like "we round the timestamp to the nearest X minutes and longitude, latitude to the nearest Y digits", it likely provides next to zero benefit from a privacy perspective. An exception is NYC, which now uses census tracts.

The "4 points to re-id" paper used ~1000 (!!) meter spatial resolution (guesstimate, since the data was crappy cell tower triangulation circa 2010) and 1 hour temporal resolution. This is far less detail than the mobility data I see being published today!

In many cases, we are talking about wide open feeds with real time, 1 minute temporal resolution, 5-10 meter spatial resolution. Policies tend to fuzz this to some degree but it is exceedingly rare that two people follow the same path above the fuzz resolution.

To measure this, take a feed with reduced resolution and apply aggregation, grouping on longitude, latitude, and time. Count the ratio of groups where COUNT = 1. The policies of this form will have exceedingly high ratios of unique trips.

What's a better solution? We have a few options. Super low precision geo coordinates like census tracts are one good-ish option from a privacy perspective, but research shows this is both not THAT effective and destroys the useful information embedded in the data in the process.

Another oft-maligned option is aggregation. Specifically, aggregation where you drop records with a unique count below K, where K is whatever policy makers feel comfortable > 1. People bash this because they mix up sufficient aggregation resolution with sufficient fuzz resolution

This approach is called k-anonymization. What's great about it is that you can adjust the spatial and temporal resolution to maximize the amount of data that is preserved. This allows arbitrarily high resolution while providing strong privacy guarantees regardless of parameters.

This is the solution proposed by the "Unique in the Crowd" paper and it is both 1. legit and 2. easy. In my humble, open-to-better-ideas opinion, virtually every city and DOT should adopt something like this as policy, since it is effective and straight forward to implement.

For the ambitious, and perhaps risk-tolerant, differential privacy is the new kid on the block. Differential privacy provides query-time dynamic resolution. This allows for the max-resolution-short-of-screwing-over-users accuracy. It's risky not because it's bad, but bc it's hard

That said, just this week, Google open sourced their differential privacy C++ library, along with a PostgreSQL extension to ease implementation. This protects against most of the tricky statistical issues you might encounter, since it's very well tested. github.com/google/differe…

The critical bit here is that individual trips are nearly impossible to anonymize without destroying all the useful information. We want more open data, not less, so just pick an aggregation method that maximizes data sharing without enabling stalkers or *don't collect at all*.

Like this thread? Get email updates or save it to PDF!

Subscribe to Morgan Herlocker

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Morgan Herlocker

This content may be removed anytime!

Try unrolling a thread yourself!

Related threads

Trending hashtags

Did Thread Reader help you today?