Tweet

Paul-Olivier Dehaye

May 25 • 15 tweets • 5 min read

As #CPDP2022 was about to start, I visited the conference location to try to better understand how Google (and others!) might be tracking participants

As an "aside", Google is a Platinum sponsor at the conference

https://twitter.com/markscott82/status/1529020465402060804?s=20&t=4OwrvRpCgWE2nmZTlsUGGw

1/n

So, let's dig in! The day before, May 20th, I went near the venue with Semantic Location History activated. That gives me a glimpse at the infrastructure of surveillance Google is leveraging.

2/n

When I takeout my Google data (and visualize it a bit more with tools developed during the #digipower investigation), I see that Google knows I was there, with 70% confidence and for 565 seconds.

Furthermore, Google knows how I arrived (where from, with which transportation mechanism) and how I left (where to), based mostly on activity sensors (gyroscopes, accelerometers, and a layer of machine learning).

It also knows I searched for the destination 20 minutes earlier.

How does Google deduce where I am? Most ppl think it will have occured through GPS. However that is not the case...

Instead, Google uses (among other signals) MAC addresses of routers for whom it knows the location, heard from my phone, and to which I DID NOT connect. Here they are, centered at the average location where they were heard during those 10 minutes.

Note the MAC address in plain, and the fact that noone has been asked for consent by Google for this type of tracking (what happens for MAC addresses associated to hotspotting phones?).

Also, is this part of the data collected by roaming cars?

Supposedly, you can opt-out by appending the "_nomap" suffix to your router (unilaterally decided by Google).

I am told even then this data is uploaded to Google servers.

More interesting: G tells me how it reached its conclusion on where I was.

Highlighted in red is "Les Halles" (I stayed in front for 10 min), with probability 85%. But I could be at restaurant Ane Vert right in front, with 2.5% probability.

This is based on triangulation of wifi signals, but as indicated earlier other signals are also used.

Here is an example from Lausanne.

We indicate with the green circle that the restaurant has been searched, which helps G figure it all out.

@markscott82

Here is an even more interesting example, taken from @markscott82 's data in Berlin.

We see that he has searched a place, and that this affects the probability calculation of where he was (indicated by thickness line to the red dot, which is still most probable according to G)

We think this is very interesting from a regulatory perspective, to better understand how different signals are used by G, for different purposes (and associated gain).

We also think it shows the utility of transparency... to some extent.

We activated Semantic Location History in order to get the data back showing the inference made by G. But we have close to 0 transparency on what G does with raw signals when they don't have additional layer of consent.

And G is ahead of the curve of others for transparency...

for some facets of the collection, certainly compared to other #CPDP2022 sponsors!

Are you interested in doing this for yourself?

Instructions here:
digipower.hestialabs.org/google#load-da…

Are you interested in digging in all this with us or students?

Contact me!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @podehaye

Paul-Olivier Dehaye

@podehaye

May 1, 2021

https://twitter.com/DominikMenges/status/1388522481275871233

"important contribution", read the article or follow the thread below... 1/n

👇

https://twitter.com/DominikMenges/status/1388522481275871233

This table is from the paper. "Ratio of persons with + test result after app notification per all SARS-CoV-2 positive cases" ranges from 0.6% to 1.8% in Zurich and 0.2% (!) to 0.6% in Switzerland as a whole, during various periods including the beginning of the 2nd wave. 2/n

Bear in mind these numbers are very likely to be overestimates.

The paper doesn't account for 3 (overlapping) effects:
- household contact w/ notification;
- non-household but already-p2p-traced contact
(see pdehaye.medium.com/lies-damn-lies… );

3/n

Read 11 tweets

Paul-Olivier Dehaye

@podehaye

Dec 31, 2020

In 2007, I participated in an Oxford vaccine trial. This was the 1st time chimpanzee adenoviruses were tested on humans. I was the second to be injected w/ this stuff.

The Oxford/AstraZeneca COVID vaccine is directly based on that tech, with tremendous hope beyond COVID

👇

2007 was a touchy time to participate in a vaccine trial. The year before a trial for Theralizumab had resulted in 6 volunteers out of 8 suffering a cytokine storm. One of the 6 lost toes and fingers. The last two? They had only received placebos!

en.wikipedia.org/wiki/Theralizu…

@MertonCollege

As a consequence, the Oxford Jenner Institute had trouble recruiting volunteers for their trials. One of my fellow @MertonCollege Fellows explained to me the science and the goals for their 2017 campaign and convinced me to participate.

Read 21 tweets

Paul-Olivier Dehaye

@podehaye

Sep 4, 2020

https://twitter.com/marcelsalathe/status/1301965936019283971

This work really makes a plethora of basic mistakes.

I will point at some below.

https://twitter.com/marcelsalathe/status/1301965936019283971

https://twitter.com/podehaye/status/1301972803638431747?s=20

1/ Think SwissCovid leads to many calls to the hotline? That's good, right?

Well, you better know what those calls are for. Building a GAEN app is like building on the quicksand of the OS you are using.

Case in point, Apple:

https://twitter.com/podehaye/status/1301972803638431747?s=20

2/ Think #SwissCovid efficacy is comparable to manual contact tracing? One paper cited is from May 2020, the other July but concerns data collected up until March 2020. Ages ago. We have learned a lot since about the virus, e.g. the heterogeneity of infectiousness

Read 9 tweets

Paul-Olivier Dehaye

@podehaye

Mar 13, 2020

@carolecadwalla

When Cummings sent his job ad, I wrote a thread on how revealing it was of his world view, and particularly that he would name check a journalist like @carolecadwalla

https://twitter.com/podehaye/status/1212909164151922688?s=20

Indeed, Cummings understands the systemic level very well. But as I said then, he fails to understand how others build meaning and why that is important.

The UK virus strategy is to #FlattenTheCurve, like everywhere else, but with a twist. The epidemiologists operate under the assumption that the behavioral scientists ("nudge unit") are correct, and it is impossible to ask for strong efforts for more than a week.

Read 13 tweets

Paul-Olivier Dehaye

@podehaye

Jan 3, 2020

@carolecadwalla

@carolecadwalla The fact that Cummings felt compelled to refer to you in this job ad is actually hugely relevant. He understands the systemic level very well, but fails at understanding how others make sense of the systemic, and why that is important.

@carolecadwalla

@carolecadwalla He also fails to understand there is circularity in the particular context of elections (i.e. process of how the average p makes sense of what's best, given tons of influence). That circularity gives more legitimacy to your stories than his calculations as a way to build meaning

@carolecadwalla

@carolecadwalla His job ad is largely obsessed with "causation" and "counterfactuals", which is a very narrow view of what "meaning" is. Meaning here are explanations that others can relate to, in the sense that they can come up with their own plan of how to relate to it.

Read 7 tweets

Paul-Olivier Dehaye

@podehaye

Jan 25, 2019

https://twitter.com/DaveLeeBBC/status/1088715635176030213

There is ABSOLUTELY a story here, documenting Facebook's resistance to the "Download your History" feature (yet their use of this White Whale for PR purposes right now, even yesterday at Davos by Sandberg) 1/n

https://twitter.com/DaveLeeBBC/status/1088715635176030213

https://twitter.com/grinbergnir/status/1088711020657471488

This is not a frivolous request. The reason to ask is that this feature would make a lot of v interesting research much easier and potent (+granular, personal feedback). E.g. this research on Twitter would transfer over completely

https://twitter.com/grinbergnir/status/1088711020657471488

@HNSGR

I have known this for a while. In fact, as I was working in 2016 w/ @HNSGR on his @CamAnalytica uncovering, I became convinced neither journalism or research alone would cut it. User-centric data, in the spirit of @mydataorg could bridge the two motherboard.vice.com/en_us/article/…

Read 21 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Paul-Olivier Dehaye

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @podehaye

Paul-Olivier Dehaye

Paul-Olivier Dehaye

Paul-Olivier Dehaye

Paul-Olivier Dehaye

Paul-Olivier Dehaye

Paul-Olivier Dehaye

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?