, 26 tweets, 6 min read Read on Twitter
What we have learned about the Census Bureau’s implementation of differential privacy.

In September 2020, the Census Bureau announced new confidentiality standards that mark a “sea change for the way that official statistics are produced and published.” 1/
The new system, known as Differential Privacy (DP), will be applied first to 2020, and “will then be adapted to protect publications from the American Community Survey and eventually all of our statistical releases.” 2/
I am increasingly convinced that DP will degrade the quality of data available about the population, and will make scientifically useful public use microdata impossible. 3/
I also believe that the DP approach is inconsistent with the statutory obligations, history, and core mission of the Census Bureau. 4/
It is important to understand that DP is not a method of disclosure control. Rather, it is just a formal definition of privacy (shown below) 5/
DP yields a measure called Epsilon (ϵ) that defines the level of “privacy” in a dataset. A small ϵ means high “privacy” as it is defined by the equation. 6/
This is very different from census law, which focuses on the disclosure of identities. The Census Bureau cannot reveal “the identity of the respondent to whom the information applies.” (Title 5 U.S.C. §502 (4)) 7/
For the past six decades, the Census Bureau’s disclosure control program has focused on protection of identities. 8/
The Census Bureau disclosure control efforts have been highly successful: There are no documented instances in which the identity of anyone in the decennial census of the ACS has been determined by anyone outside the Census Bureau 9/
It is important to understand that DP does not measure disclosure risk. According to McClure and Reiter (2012), disclosure risk can be very small even when ϵ is large. 10/
semanticscholar.org/paper/Differen…
DP prohibits revealing characteristics of an individual even if the identity of that individual is effectively concealed. This is a radical departure from established census law and precedent. 11/
Under this re-interpretation, all microdata based on real responses to censuses or surveys has always been illegal, even if identities were effectively protected. /12
The new disclosure rules were motivated by the threat of “database reconstruction”

According to Abowd (2017), database reconstruction “is the death knell for public-use detailed tabulations and microdata sets as they have been traditionally prepared.” 13/
I have argued elsewhere that the risks of Database Reconstruction have been exaggerated. 14/
assets.ipums.org/_files/mpc/wp2…
The Census Bureau showed that an attacker could use database reconstruction to guess race or Hispanic origin, but the attacker would usually be wrong and would have no means of determining whether or not they were correct. 15/
Ron Jarmin, Deputy Director of the Census Bureau, agrees: 16/
On June 6 the Census Bureau released noise-infused data from the 1940 census. 1940 was used because the data are publicly available through IPUMS, so experimentation will not compromise confidentiality. 17/
IPUMS staff led by @dcvanriper with the assistance of @j_p_schroeder , @momentinthepark, and @josedpacas have been looking into the impact of DP on census results, and the results are startling. 18/
This graph shows impact of noise when ϵ =.25, the same noise level as used in the 2018 dress rehearsal. Each dot is a Minnesota enumeration district, about the size of block groups

Vertical axis: % adult in noise infused data
Horizontal axis: % adult in real 1940 data
19/
Many of the noise-infused districts have 100% or 0% adults (100% children!).
These data would not be usable for drawing school district boundaries. 20/
The Census Bureau provided noise-infused data for eight levels of epsilon. Only the highest ones would be usable, but it is not clear that they would offer as much disclosure protection as current Census methods (see McClure and Reiter). 21/
Here is another example. This is an index of racial diversity at the eight levels of ϵ. Clearly, you do not want to use noise-infused data to study residential segregation. 22/
This is all about summary files. The situation for microdata is much worse. Microdata representing real individual-level responses is inconsistent with differential privacy. 23/
Conclusions
DP may make tabular data unusable for most applications of small-area data.
DP is not appropriate or feasible for ACS microdata 24/
Recommendation: Methods that focus on disclosure control rather than an arbitrary definition of “privacy” are needed if we want to optimize the trade-off between risk and usability. This requires more research, and cannot be achieved in tome for 2020. /FIN
Correction of date:

In September 2018, the Census Bureau announced new confidentiality standards that mark a “sea change for the way that official statistics are produced and published.”
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Steven Ruggles
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!