1.I prepared a report for the Plaintiffs in the Alabama v. Department of Commerce lawsuit over differential privacy in the census, available here: users.hist.umn.edu/~ruggles/censi…
2.I argue that the database reconstruction experiment did not demonstrate a convincing threat to confidentiality, because the results reported by the Census Bureau can be largely explained by chance.
3. Any randomly-chosen age-sex combination would be expected to be found on any given block more than 50% of the time.
4. Therefore the match rates reported for the database experiment are greatly exaggerated. This is the evidence the Census Bureau used to justify adoption of differential privacy.
5. This finding is important because differential privacy is likely to substantially reduce the usability of census data for social and economic research. users.hist.umn.edu/~ruggles/Artic…
6. According to the Census Bureau’s own account, the “reconstructed” data is usually false, an intruder would have no means of determining if any inference was true, and an intruder would lack the data needed to estimate the probability that a re-identification succeeded.
7. Acting Director of the Census Bureau Jarmin wrote “The accuracy of the data our researchers obtained from this study is limited, and confirmation of re-identified responses requires access to confidential internal Census Bureau information …
9. My analysis further reveals that most of the matches between reconstructed data and real data reported by the Census Bureau would occur purely by chance.
The database reconstruction experiment therefore poses no risk to the Census Bureau’s confidentiality guarantee.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
What we have learned about the Census Bureau’s implementation of differential privacy.
In September 2020, the Census Bureau announced new confidentiality standards that mark a “sea change for the way that official statistics are produced and published.” 1/
The new system, known as Differential Privacy (DP), will be applied first to 2020, and “will then be adapted to protect publications from the American Community Survey and eventually all of our statistical releases.” 2/
I am increasingly convinced that DP will degrade the quality of data available about the population, and will make scientifically useful public use microdata impossible. 3/