Last up in Privacy Tech for #enigma2021, @xchatty speaking about "IMPLEMENTING DIFFERENTIAL PRIVACY FOR THE 2020 CENSUS"

usenix.org/conference/eni…
Differential privacy was invented in 2006. Seems like a long time but it's not a long time since a fundamental scientific invention. It took longer than that between the invention of public key cryptography and even the first version of SSL.
But even in 2020, we still can't meet user expectations.
* Data users expect consistent data releases
* Some people call synthetic data "fake data" like
"fake news"
* It's not clear what "quality assurance" and "data exploration" means in a DP framework
We just did the 2020 US census
* required to collect it by the constitution
* but required to maintain privacy by law
But that's hard! What if there were 10 people on the block and all the same sex and age? If you posted something like that, then you would know what everyone's sex and age was on the block.
Previously used a method called "swapping" with secret parameters
* differential privacy is open and we can talk about privacy loss/accuracy tradeoff
* swapping assumed limitations of the attackers (e.g. limited computational power)
Needed to design the algorithms to get the accuracy we need it and tune the privacy loss based on that.

Change in the meaning of "privacy" as relative -- it requires a lot of explanation and overcoming organizational barriers.
By 2017 thought they had a good understanding of how differential privacy would fit -- just use the new algorithm where the old one was used, to create the "micodata detail file".
Surprises:
* different groups at the Census thought that meant different things
* before, states were processed as they came in. Differential privacy requires everything be computed on at once
* required a lot more computing power
* differential privacy system has to be developed with real data; can't use simulated data to do this because the algorithms in the literature weren't designed for dats anything like as complex as the real data (multiracial people, different kinds of households, etc)
* to understand the privacy/accuracy trade-off requires a lot of runs, representing a *lot* of computer time
Census bureau was 100% behind the move
* initial implementation was by Dan Kiefer, who took a sabbatical
* expanded team to with Simson and others
* 2018 end to end test
* original development was on an on-prem Linux cluster
* then got to move to AWS Elastic compute... but the monitoring wasn't good enough and had to create their own dashboard to track execution
* it wasn't a small amount of compute
* republished the 2010 census data using the differentially private algorithm and then had a conference to talk about it
* ... it wasn't well-received by the data users who thought there was too much error
For example: if we add a random value to a child's age, we might get a negative value, which probably won't happen to a child's age.

If you avoid that, you might add bias to the data. How to avoid that? Let some data users get access to the measurement files [I don't follow]
In summary, this is retrofitting the longest-running statistical program in the country with differential privacy. Data users have had some concerns, but believe it will all come out.
Code is up on github and papers are up online. (@xchatty have some links?)

[end of talk]

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lea Kissner

Lea Kissner Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @LeaKissner

3 Feb
Next up at #enigma2021, Alex Gaynor from @LazyFishBarrel (satirical security company) will be talking about "QUANTIFYING MEMORY UNSAFETY AND REACTIONS TO IT"

usenix.org/conference/eni…
Look for places where there are a lot of security issues being handled one-off rather than fixing the underlying issue Image
We tried to fix credential phishing mostly by telling people to be smarter, rather than fixing the root cause: people being able to use phished credential.

2-factor auth just ... fixes the problem. ImageImage
Read 15 tweets
3 Feb
It's time to talk about @zoom_us security over @zoom_us at #enigma2021 by Merry Ember Mou with the talk "BUILDING E2EE AND USER IDENTITY"

usenix.org/conference/eni…
Zoom's launched end-to-end encryption 5 months after the white paper was published
* prevents eavesdroppers between users who are speaking to each other
* protection against compromised servers Image
[ here's the E2EE whitepaper from Zoom]

github.com/zoom/zoom-e2e-…
Read 20 tweets
3 Feb
@carmelatroncoso is speaking about "CONTACT TRACING APPS: ENGINEERING PRIVACY IN QUICKSAND" at #enigma2021

usenix.org/conference/eni…
Engineering contact-tracing apps has been a marathon

Why make them?
* manual contact-tracing became totally overwhelmed with covid cases
* can we supplement with technology? Image
Constraints: security and privacy
* protect from misuse: surveillance, target marginalized individuals, etc.
* purpose limitation by default
* hide user's identity, location, behaviour
* preserve system integrity
Read 18 tweets
3 Feb
In more pandemic talks at #enigma2021, Mark Funk is here to talk about "DESIGNING VERIFIABLE HEALTH SOLUTIONS FOR GLOBAL PANDEMIC"

usenix.org/conference/eni…
This is about work done with a nonprofit to try to find a way to prevent infected people from entering a location in a privacy-preserving way.

(Stopped this work when it became clear that this was being built for a world which wouldn't exist any time soon.) Image
Right now, we ask people to self-diagnose, which requires on diagnosis and truthfulness

There are stronger mechanisms like PCR tests Image
Read 25 tweets
3 Feb
Last day of #enigma2021 and we're kicking off with @cooperq from @EFF talking about "DETECTING FAKE 4G LTE BASE STATIONS IN REAL TIME"

usenix.org/conference/eni…
Focus on tech which targets at-risk people (e.g. activists, rights defenders, sex workers) Image
What is a cell site simulator?

*transmitter or receiver which intercepts metadata from cell phones, often by pretending to be a legit cell tower Image
Read 21 tweets
2 Feb
Last talk at #enigma2021 today is @iMeluny speaking about "DA DA: WHAT SHARK CONSERVATION TEACHES US ABOUT EMOTIONALITY AND EFFECTIVE COMMUNICATION STRATEGIES FOR SECURITY AND PRIVACY"

usenix.org/conference/eni…
I dreamt of being a shark scientist and worked my ass off to get a scholarship to one of the top programs. My career took a loop, but to this day I find lessons from sharks for security and privacy.
Lessons:
Incidents are emotional
* Risks will never be zero
* Public is ill-informed and fear is common
* science-based policy is not the norn Image
Read 20 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!