Tweet

Irenes (many)

Jun 23 • 213 tweets • 18 min read

Okay! #pepr22 is about to come back from lunch. We all just had the "birds of a feather" breakout sessions, which were a ton of fun.

This next block of talks is on consent.

Oh - it's consent, and a panel on DNS :)

Next talk: "Consent on the Fly: Developing Ethical Verbal Consent for Voice Assistants" by William Seymour. #pepr22

(we're really excited by this, this is a need we tried to get people at Google interested in years ago, but couldn't)

voice assistants such as Siri talk to users verbally, but for agreeing to terms of service and stuff they send you to your phone

there's a slide with an Alexa conversation transcript. when you ask for permission it's the OS talking, not the app ("skill") ... but just for two lines, and the user can't hear the diference

dialogue is also very short compared to legal texts, and doesn't answer key questions such as what the data is for

the speaker says a lot of Alexa apps are totally missing their privacy policies

another challenge is that doing this verbally means you haven't shown users how to find the setting again to change it later

finally, doing it verbally creates a sense of time pressure

one approach might be to change the voice that the OS uses to talk, compared to apps, but ... users don't understand it

research has shown that users don't even distinguish between first-party and third-party apps on voice assistants. (the speaker is using the official name, "skills", but we think that's unnecessarily confusing...)

the speaker also sees some hidden opportunities here, and would like to explore them. make consent less of a Hobson's Choice. disentangle legal and ethical consent.

automated consent decisions? make consent less adversarial than it is today?

consent as a legal mechanism plays a different role from consent as an ethical best practice, and they don't always need to be the same

Q: how do we make sure the context of a permission actually makes sense if it's freeform text?

A: this is tricky. human review is the only thing we have. the main focus is on other things, such as reviewing privacy policies and making sure they point somewhere...

Q: how do you handle adversarial cases? the user is in a crowd, the user is a secondary member of the household?

A: modern voice assistants can recognize people's voices but you don't necessarily want to depend on it...

somebody in Slack commented "or the user is a small child!"

Q: how do you handle fine-grained consent? the user consents for one query, but what about secondary use such as analytics?

A: a lot of the information you need to give is very detailed. as a counterpoint, this isn't really presented anyway, you wouldn't normally see it. (what a wonderful frank discussion this is)

need to get expert opinions on what information is most important to present

there's lots more to this Q&A happening on Slack

next talk: Lorrie Cranor, "Informing the Design of Cookie Consent Interfaces with Research" #pepr22

presenting from her kitchen :D seems pre-recorded

kudos for having a plate of cookies

what makes a consent interface usable? her CMU research team has identified seven properties.

1. address user needs, make sure to choose the options people actually want.

2. require minimal user effort, it should be easy to find and select the choice options. the interface is easy to find - it's a popup that gets in your way! - but sometimes choosing the options you want requires doing it one at a time...

3. make users aware of what choices exist and where to find them

... and more :) the slide grew quickly

looking at an example popup. you can click the big green button to accept; if you want to do anything else it's harder to find and requires an unknown amount of work

another example has a dark pattern with confusing buttons that don't convey what they do. what happens if you click the X, which settings do you get?

dark pattern: confirm shaming. this banner talks about the "quality organic ingredients" of their cookies. sounds great but this has nothing to do with this website's use of cookies!

an Austrian organization, NOYB ("none of your business") has been reaching out to companies whose banners violate GDPR

most common dark pattern: unequal paths.

the CMU students realized there were a lot of design parameters that showed up frequently, probably due to shared libraries

the students put together a site to experiment with different consent dialogs

(we got nerd sniped by termsandconditions.game - go play it!)

the talk is really good though, the students evaluated lots of design patterns. they are working to put together a proposal for standard cookie language that makes more sense to users.

recommendation: if you aren't offering any choices, don't bother users with a banner! the laws don't require you to offer a notice if you're not offering a choice

recommendation: don't bury choices behind an additional link

instead, make all the choices visible from the start (the DHL website is a good example)

another issue: should you have a fully blocking cookie banner? what about a less conspicuous button? most users will never interact with it... but that's only okay if the default is to set the bare minimum cookies

we need to move to automated decisions to reduce user burden. P3P and Do Not Track were early attempts at this, but failed to get industry support.

time for Q&A

(we only just realized, but even in-person questions are submitted via Slack, which seems like a great way to do it)

Q: did people understand the terms and conditions?

A: no, not at all. only 16% in our experiment had the right answer.

Q: even with a well designed prompt, will users be uneasy because they don't understand the consequences?

A: yes we find evidence of that. people don't know what is a necessary cookie, why is it necessary?

more Q&A on Slack

now for the panel discussion: "DNS Privacy Vs." is the title. Shivan Kaul Sahib from Brave Software; Mallory Knodel, Center for Democracy & Technology; and... one more whose name is not on the website so we didn't catch it. #pepr22

oh the third participant is the moderator. amazingly and bravely, the two panelists and moderator are in three different locations! yay for virtual conferencing

Shivan is going to give background on private DNS methods, then there's going to be discussion of public interest concerns which might conflict with these new protocols

we misstated that, Mallory is moderating, and we're not sure who the other panelist is. alas.

Shivan: typically what you're trying to hide with DNS is who queried for what. despite the DNS records being public, the fact that a particular client is querying for a particular website is quite sensitive. when people talk about DNS privacy this is what they mean.

two older and more famous protocols in the space: DNS over TLS and DNS over HTTPS. main difference: DNS over TLS offers over port 853, very simple for a network admin to block.

DNS over HTTPS, in contrast, blends in with other web traffic. most browsers ship with this support, but differ widely in how aggressively they turn it on and discover resolvers.

there's also DNS over QUIC, which is the new encrypted transport layer. over a dedicated port, so easy to block.

in these protocols, the resolver you're talking to learns who's querying and what for.

"oblivious DNS over HTTPS", ODOH, was invented to address this. slightly different. the resolver gives the client a public key; the client encrypts and sends via a proxy which forwards the query to the resolver.

this means there are now two parties who would have to collaborate to identify who's querying what

you are still hoping these two entities are not coerced to collude together, ie. by a government

that was Shivan's intro. next Q: what is it about these techniques that upset people?

Q continued: now other parts of the tech stack can't see domain lookup data. lots of things have depended on it - some bad, such as censorship, but some good too

criticisms have come from many directions

some of the critique claims to be about the public interest but it's hard to know that's true

there's anticompetitive concerns. there's concerns about making abuse mitigation harder.

there are third party tools that act within user agents for accessibility and UX improvements, which can be disrupted by this

"privacy and censorship are often two sides of the same coin" - if you can't tell who someone is you can't stop them from accessing information. so sufficiently motivated censors take even more severe measures.

back to Shivan. in general, it's interesting to examine how much we depend on consolidation for privacy benefits.

it's good that centralized platforms can roll things out and everyone benefits, but then they get all the data.

hypothetically, if everyone trusted their ISP, you wouldn't need these mechanisms

(we have personal thoughts on this... we think the concept of trust needs to be examined, and we don't think the idea of trusting an ISP even makes sense, but that's a big topic)

there's a tension. as protocol designers we should make it so we don't just de facto rely on consolidation to ensure privacy.

it's important to think about UX when discussing DNS. browsers push this on the user: choose your own resolver. great, but it's hard to get meaningful consent that way.

is there a metaphor we can use, or some user education we can do, to make this choice more meaningful? or should we give up and move on?

if it turns out that DNS doesn't contribute to perceived latency, then we could make something even more private, like DNS over Tor

okay, now for the other panelist. apologies if they see this, we aren't able to type this name without seeing it written.

they have an IETF background and have thoughts on DNS privacy and censorship

user traffic patterns have been sold by ISPs, and the link to the government is also important. your government has access to this information, directly and indirectly, and can see the web services that you've used.

plaintext DNS have not just been a matter of privacy laws but also directly affect freedom of expression and the right to access information. anything visible to ISPs can be used to stop access; this also applies to state control infrastructure.

there's a tendency for high-censorship countries to block users who use privacy technologies.

one criticism of DNS over HTTPS is that it might be easy to fingerprint these connections. no specific evidence this is happening, just a theoretical risk.

blocking specific endpoints: even with DoH, if you're using Cloudflare's or Google's, the endpoint is still the same and there's privacy holes in other protocols, such as SNI in HTTPS, to block that endpoint

we've seen evidence of censorship being applied with SNI

certain governments might take disproportionate steps. even in that calculation, there is tension, there is a need to limit negative press.

if you look at the history of online censorship, it isn't always the ISP. ISPs operate in local jurisdictions so they're more willing to comply with local law, but governments also go to big platforms such as Facebook and ask to remove content at the source.

back to Mallory now. lean in to the fact that there is a tension, think about how to resolve it. we haven't offered solutions but the first step is to acknowledge there's a real problem.

Mallory's Qs for the third panelist: regarding SNI (https server name indication)... there is a plaintext request. how do you encrypt that? it seems like it might not go well? we've seen censors block TLS 1.3 with encrypted SNI. why did that happen?

Shivan: for encrypted SNI, it was almost expected. both problems were called out. "do not stick out", how do we make this privacy extension not appear different from previous traffic?

there was evidence from China that any TLS traffic with encrypted SNI was dropped, because "do not stick out" was violated.

second problem: what is infeasible for a censor to block? this is about critical mass

TLS 1.3 was rolled out so quickly that few countries blocked it, because there was so much momentum that it was impossible

Mallory: we call out "too big to block"

Mallory Q for Shivan: rollout of DoH and DoT. it was interesting to see what ISPs said, pushing back. solution: they all just adopted it and made themselves DoH resolvers. in the paper we discuss the difference between trusted resolvers (ie. Firefox) and other browsers...

what are some ways we can tackle this rollout question? what about the pluralism to defeat consolidation?

third panelist: Mozilla had a trusted resolver program. as long as an ISP offered to provide an encrypted DNS service and agreed to the principles (won't store data, etc)...

... then Mozilla will include the resolver among the available options. first check local ISP to see if they have a service.

Brave is considering different experiments, but have to be careful... turning it on by default is hard to tackle

what regions do you turn it on by default in? do you risk breaking websites when you do it?

Mallory: it matters, it isn't just the protocol, it's the implementation. most of this discussion is not about the design of the protocol.

now for Q&A. Mallory will moderate. questions will be asked via Slack.

this convo is moving fast, it's hard to keep up with

prediction: private DNS servers will start receiving censorship requests

with oblivious DNS, who will own or control the proxy?

it's crucial that it be a different organization or the whole thing collapses

when all applications set their own DNS resolvers, it gets messy

in general, using DoT at the OS level would give users the most control and leverage over their entire environment - you'd only have to set it once

other applications will always be able to override that, so asking about user control is thorny - keen question.

next question: how does iCloud Private Relay impact this discussion?

it protects your IP address and obfuscates your DNS entry before it goes to an external resolver

okay! panel's done, there's now a break until 4:05 Pacific.

and we're back, for the last block of talks for the day! next up is Jack Gardner and Akshath Jain, with "Helping Mobile App Developers Create Accurate Privacy Labels" #pepr22

CMU researchers designed concepts for this, then Apple and Google did their own versions which are now widely deployed. the talk is going to dig into challenges for developers, and discuss relevant prior work.

they have developed a tool called Privacy Label Wiz, which does static analysis. they will discuss the limitations of this approach.

here's an example Apple privacy label - the one for the Facebook app. there are 17 categories of data disclosed, and if you click see details you get it even more granular.

"data linked to you" - Apple uses this narrowly, data linked to a unique identifier such as a device ID. developers may think it refers to anything linkable to users in real life.

(why does Apple use it narrowly? that seems very limiting... ah well)

we need tools to make this easier

we missed a bit, but now it's talking about static analysis. among other things they detect iOS permissions and map these to Apple's data types.

they evaluated their tool by interviewing developers about it

they identified several fundamental shortcomings. need to introduce the topic. need to prompt developers to think about third-party data flows.

so they refined it to explain that stuff better, and modified the UI to resemble Apple's web form

the intro they added also explains to developers which things the tool can tell them vs. which things rely on what they tell it

developers click through the wizard and add data, then get a summary that gives them a chance to review it

this is a pretty detail-heavy overview of each step in the wizard, we're not going to try to capture all of it

lots of Q&A on Slack

next talk: "Creating Effective Labels" by Shakthi Gopavaram.

today, users are solely responsible for protecting their own privacy. this is a big task!

suppose you want to install a meditation app. the app store has many options, and it gives you metrics for rating and number of downloads, but no information about privacy. how do you make a privacy-preserving choice?

privacy labels are needed to help users make informed decisions. what does it take?

address information asymmetry. reduce cognitive burden. address psychological biases. personalize it.

information asymmetry happens when the buyer can't distinguish between high quality and low quality. the used car market is a classic example.

presenting relevant information can help, but this increases cognitive load.

one researcher estimated that the cost of reading privacy policies is billions of dollars.

labels should convey privacy instantly, and enable quick comparisons between products.

(one problem we see is that there are few meaningful differences between apps' privacy practices, right now. they're all terrible!)

(most users don't want to hear that everything is terrible and they should never download any apps. they want to use their expensive devices to do cool things.)

(our own work also focuses on this, so we have a lot of thoughts on it)

psychological biases. positive framing is more effective than negative. visual example with rating 1 to 5 "padlocks" vs 1 to 5 creepy glowing red eyes.

(we're gonna have to remember that. which framing is more effective depends on your goals...)

endowment effect. a professor gave students a free RSA tee shirt, even if they didn't want it. a week later she tried to buy them back... nobody would sell

this illustrates the point that people consider something more valuable if it's something they have

they did an experiment with internet-of-things devices (IoTMarketplace). control group, willingness-to-pay, willingness-to-accept. the first two groups behaved the same; the usefulness of the label was negated by psychological biases.

defaults matter. status quo bias states people are more likely to stick to the default.

privacy preferences vary from person to person, so the labels should too. ML can help!

(personally, the very last thing we want is an ML model that can accurately predict which things worry us. that sounds really dangerous.)

(good intentions...)

Q: how was that choice provided to users?

A: through interface design, no text. the published paper shows the details.

Q: the terminology is important in reducing the cognitive burden. how does this research bridge the gap?

A: you have users who work in privacy; you also have users with no technical experience. if you can design a label that can be understood by the latter group, that will help others also.

next talk: "Three Years of Crowdsourcing Network Traffic from Smart Homes", by Danny Yuxing Huang. this is part of a session on smart homes.

the speaker is an NYU assistant professor

they collected data from thousands of smart homes

these are very heterogenous. some devices "could be listening to you", some "could be watching", some "could track your viewing habits". some could be running old versions that have security holes, and are compromised.

there's no barriers. are devices passing sensitive information to each other? is there

malware infecting each other?

most users don't know what's happening on their home network and don't know how to run analysis tools such as Wireshark.

most researchers can't afford thousands of devices for the lab, or get data from companies.

soooo.... help users visualize network activities of their smart home devices. with a tool IoT Inspector they produce a realtime network analysis chart.

in a video demo the researcher is passively watching TV and the TV is communicating with several different advertisers, including Google

okay... now, you might have a dozen devices. in an analysis like this, how do you help users understand which is which?

you can try to do analysis based on MAC, but if it's a generic chip, that will only tell you the chip manufacturer

sometimes you can tell from its DHCP name, or from its HTTP user agent, or from mDNS, or from what sites it tries to visit...

they asked their crowdsourcers to contribute device names

sometimes the hostnames are spooky! one smart device communicated with a navy.mil domain! this turned out to be an NTP time server

they scanned over 209,000 devices. users inspected about 63,000.

this was about 6400 users, who ran the inspector for a median time of about 40 minutes.

only about 15,000 devices were labeled, and only about 2,900 users helped with labels

can they do it more? what are the incentives? they're not paying people, so they should make a more attractive product. they are partnering with Consumer Reports.

Q: what is your privacy policy?

(how shocking that a bunch of privacy people asked this :D)

A: we have a user ID which we generate at the time of install. for each user ID we only inspect traffic from devices that users explicitly tell us to.

they get the first 3 bytes of the MAC address; also hostnames, IP addresses, and SNI.

Q: do you have any idea how users felt about the results?

A: in the works! they hope to be able to answer that soon

Q: how could better smart home network setup flows help solve this at a structural level?

A: yes better isolation for sure. many ISPs provide routers with guest wifi networks, on which devices can't talk to each other... but there are tradeoffs because if you put all your devices on there it's hard to interoperate. you want your phone to talk to your devices.

Q: after doing the study did you change your mind about buying a smart TV?

A: I'd be careful what user account to use. apps on smart TVs send "interesting" information to third parties.

next talk! "Informing the Design of Privacy Awareness Mechanisms for Users and Bystanders in Smart Homes", Yaxing Yao from U Maryland

in a smart home setting, data collection does not stop at users. your roommate's microphone or camera may record you without your awareness. if you go to a party, your friend's device may collect your data.

they define "users" as those who own the devices, and "bystanders" as those who don't own the devices but may be subjected to collection

bystanders may want the device to be off to stop data collection, while users may want to keep it running for security reasons

privacy dashboard. they created this tool, apparently. benefits for users and bystanders: provides detailed information. benefit for bystanders: centralized source.

drawback for users and bystanders: lack of control. drawback for bystanders: violates social norms to use someone else's device.

data app, on smartphones. benefits: accessible from private devices (U&B), detailed information (U&B). drawbacks: security concerns (U), invades user privacy (B).

ambient light to disclose privacy information - green/yellow/red. benefits: easy to understand (U&B), unobtrusive (U&B). drawbacks: not informative (U&B), psychological burden (B).

video demo. really cool! they put the light bulb in front of a television....

(as people are observing in Slack, it wasn't particularly unobtrusive)

privacy speaker: make a scary buzz sound...

video demo of this one too. wow it sure does not like TV commercials.

users called it annoying and intimidating

users prioritize device utility. bystanders tend to consider social factors such as relationships and power dynamics.

how to move forward? try to provide easy and equal access. try to provide unobtrusive modality.

Q: can I buy this speaker?
A: when we have the product!

Q: something about equal access...
A: not really about controlling the device, more about equal access to privacy information

Q: would an option to provide a periodic summary of network traffic, such as daily or weekly, be more informative?
A: possibly!

last talk of the day! "Privacy-Preserving Protocols for Smart Camera Systems and Other IoT Devices", Yohan Beugin from Penn State.

smart cameras such as the Ring are all over the US, per this chart. they provide livestreams...

but providers' incentives were initially obscured, until it was revealed that Ring partnered with police. some providers are untrustworthy, such as cameras that can be accessed by strangers, or security workers who view people's private lives

so, how to return control to users? they'd like to use end-to-end encryption

however, this creates challenges to reproduce the functionality of commercial devices. how do they establish root of trust? how do they manage the keys?

they place a printed QR code on the physical device, and have users scan it to create a secure channel.

they start with a seed key and derive additional keys in a binary tree, using hash-based key derivation to produce a family of 8 related keys.

this binary key tree structure is convenient for them because the devices only need to negotiate the seed key and can then derive the others without further communication

they show several other kinds of smart devices, not all of which have sensors that can enable this flow

so they are making sure all the devices they work with have a sensor or some sort

Q: the Nextdoor problem. there is a desire among users to not share camera feeds with law enforcement. how does that affect the design?

A: the paper doesn't address this bystander issue, but it's important.

Q: consumers resell devices. how to deal with that?
A: the system now supports factory reset

Q: talk more about what the users do to use the physical motion sensor to pair devices

A: it's for making sure you're connected to the right device. the phone might ask them to do some specific task.

like put your hand in front of the sensor

and that's it! those of you attending in person get to go to a reception, have fun there! the rest of you, say hi in Slack :)

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Irenes (many)

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @ireneista

Irenes (many)

Irenes (many)

Irenes (many)

Irenes (many)

Irenes (many)

Irenes (many)

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?