Back in December 2018, Mark Hansen wrote an article on differential privacy in the NYT. To explain how the Census protected identities in 2010, he cited the case of the only two residents of Liberty Island in New York harbor, who oversee the national monument./1
Liberty Island is considered a block by the Census Bureau, even though it only has two residents. The actual residents of the island were a married couple, who were interviewed by Hanson, aged 59 and 49, who both identified as white./2
To protect their identities, the Census Bureau “swapped” the couple with a different couple residing nearby, a 63 year-old-man and a 58-year old woman who identified themselves as Asian./3
My colleague @dcvanriper looked into what happened to Liberty Island after is was subjected to the Census Bureau’s new method of disclosure control, “differential privacy.” The results are stunning./4
The table below shows the population characteristics of Liberty Island as published in the 2020 census and after “noise infusion” to protect privacy. The April E12 file is the one that (until today) they said they were going to use, and the April E4 has a higher level of noise./5
The population of Liberty Island was increased from 2 to 48 in the E12 version of the data and to 72 in the E4 version. Not sure where all those people would fit on that little island. Maybe they could move into the statue./6
The big winners were the people listed with “two or more races,” who represent the great majority of these vastly inflated populations./7
After the demographic, planning, and redistricting communities asserted that the E12 and E4 files were unfit for use, the Census Bureau today announced an E17 data product, which they claim will be substantially more accurate. /8
Unfortunately, there will be no way for the user community to evaluate the new data until September, and the 2020 census redistricting file will be released in August, so the opportunity to provide feedback is gone./9
When the E17 data product is released, the first statistics I would like to see are the population characteristics of Liberty Island./10
Here is the link to the 2018 Hansen article in the NYT:…
Errors in this thread, by tweet number:
2. I spelled Hansen Hanson
4. is was ->it was
5. .. as published in the 2010 census (not 2020)
Note: E12 and E4 are 2010 demonstration files produced by Census so users could evaluate disclosure control

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Steven Ruggles

Steven Ruggles Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @HistDem

8 Jun
Somehow I missed this subtweet. @frankmcsherry finds it "mystifying" that "(some) demographers" (i.e. me) have "contempt for the the privacy of their subjects." In his blog quotes, he screenshots my tweets and characterizes them as "just embarrassing."/1
In my tweets, I was objecting to @john_abowd's characterization of a 45% match rate between his so-called database reconstruction and the actual data on four characteristics as "highly accurate."
Since then, I have demonstrated that the supposed reconstruction had a match rate only slightly better than one would expect by chance.…
Read 9 tweets
3 Jun
There has been so much happening on the Census Bureau’s disclosure avoidance plans that it can be hard to follow. Here’s a quick guide to key developments over the past three months. /1 Image
Back on March 10, the State of Alabama filed a lawsuit objecting to the use of differential privacy in the census, arguing that the infusion of deliberate errors into the data is unconstitutional./2
The lawsuit also argues that the decision to implement differential privacy and to delay the redistricting numbers is “arbitrary and capricious” in violation of the Administrative Procedure Act./3
Read 21 tweets
2 Jun
The Census Bureau has conducted a new analysis that purports to show that swapping is ineffective for the prevention of reidentification attacks. /1…
The new analysis completely misses the point, and actually provides a useful demonstration of the gross misrepresentation of the Census Bureau’s “Database Reconstruction Experiment.” /2
The Census Bureau claimed that without swapping, they could “putatively re-identify” 44.60 of the population. /3
Read 12 tweets
21 May
/1. Yesterday at the ACS Data Users Conference, the Census Bureau described its plans to replace the American Community Survey (ACS) microdata with “fully synthetic” data over the next three years.
/2. Details of the methodology have not been disclosed, but the idea is to develop models describing the interrelationships of all the variables in the ACS, and then construct a simulated population consistent with those models.
/3. Such modeled data captures relationships between variables only if they have been intentionally included in the model. Accordingly, synthetic data are poorly suited to studying unanticipated relationships, which impedes new discovery.
Read 30 tweets
19 May
/1. @samwang misinterprets the second declaration of John Abowd in Alabama v. Department of Commerce.
/2. Abowd states that in tiny blocks, if you “reconstruct” age and it matches someone who lives on the on the block in the commercial database, and then look up the names of those people in the census, the census recorded the same people 72.24% off the time.
/2. Everyone on the block in the commercial database ought to be found on the same block in the census.
Read 13 tweets
15 May
/1. The Census Bureau plans to add intentional errors to the 2020 census to protect the confidentiality of census respondents. The Census Bureau insists that the intentional error is necessary to combat the threat of “database reconstruction.”
/2. Database reconstruction is a process for inferring individual-level responses from tabular data. The Chief Scientist of the Census Bureau asserts that database reconstruction “is the death knell for traditional data publication.”
/3. To demonstrate the threat Census conducted a database reconstruction experiment that attempted to infer the age, sex, race, and Hispanic or Non-Hispanic ethnicity for every individual in each of the 6.3 million inhabited census blocks in the 2010 census.
Read 20 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!