As #CPDP2022 was about to start, I visited the conference location to try to better understand how Google (and others!) might be tracking participants

As an "aside", Google is a Platinum sponsor at the conference



1/n
So, let's dig in! The day before, May 20th, I went near the venue with Semantic Location History activated. That gives me a glimpse at the infrastructure of surveillance Google is leveraging.

2/n
When I takeout my Google data (and visualize it a bit more with tools developed during the #digipower investigation), I see that Google knows I was there, with 70% confidence and for 565 seconds.
Furthermore, Google knows how I arrived (where from, with which transportation mechanism) and how I left (where to), based mostly on activity sensors (gyroscopes, accelerometers, and a layer of machine learning).

It also knows I searched for the destination 20 minutes earlier.
How does Google deduce where I am? Most ppl think it will have occured through GPS. However that is not the case...
Instead, Google uses (among other signals) MAC addresses of routers for whom it knows the location, heard from my phone, and to which I DID NOT connect. Here they are, centered at the average location where they were heard during those 10 minutes.
Note the MAC address in plain, and the fact that noone has been asked for consent by Google for this type of tracking (what happens for MAC addresses associated to hotspotting phones?).

Also, is this part of the data collected by roaming cars?
Supposedly, you can opt-out by appending the "_nomap" suffix to your router (unilaterally decided by Google).

I am told even then this data is uploaded to Google servers.
More interesting: G tells me how it reached its conclusion on where I was.

Highlighted in red is "Les Halles" (I stayed in front for 10 min), with probability 85%. But I could be at restaurant Ane Vert right in front, with 2.5% probability.
This is based on triangulation of wifi signals, but as indicated earlier other signals are also used.

Here is an example from Lausanne.

We indicate with the green circle that the restaurant has been searched, which helps G figure it all out.
Here is an even more interesting example, taken from @markscott82 's data in Berlin.

We see that he has searched a place, and that this affects the probability calculation of where he was (indicated by thickness line to the red dot, which is still most probable according to G)
We think this is very interesting from a regulatory perspective, to better understand how different signals are used by G, for different purposes (and associated gain).

We also think it shows the utility of transparency... to some extent.
We activated Semantic Location History in order to get the data back showing the inference made by G. But we have close to 0 transparency on what G does with raw signals when they don't have additional layer of consent.

And G is ahead of the curve of others for transparency...
for some facets of the collection, certainly compared to other #CPDP2022 sponsors!
Are you interested in doing this for yourself?

Instructions here:
digipower.hestialabs.org/google#load-da…

Are you interested in digging in all this with us or students?

Contact me!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Paul-Olivier Dehaye

Paul-Olivier Dehaye Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @podehaye

May 1, 2021
"important contribution", read the article or follow the thread below... 1/n

👇
This table is from the paper. "Ratio of persons with + test result after app notification per all SARS-CoV-2 positive cases" ranges from 0.6% to 1.8% in Zurich and 0.2% (!) to 0.6% in Switzerland as a whole, during various periods including the beginning of the 2nd wave. 2/n
Bear in mind these numbers are very likely to be overestimates.

The paper doesn't account for 3 (overlapping) effects:
- household contact w/ notification;
- non-household but already-p2p-traced contact
(see pdehaye.medium.com/lies-damn-lies… );

3/n
Read 11 tweets
Dec 31, 2020
In 2007, I participated in an Oxford vaccine trial. This was the 1st time chimpanzee adenoviruses were tested on humans. I was the second to be injected w/ this stuff.

The Oxford/AstraZeneca COVID vaccine is directly based on that tech, with tremendous hope beyond COVID

👇
2007 was a touchy time to participate in a vaccine trial. The year before a trial for Theralizumab had resulted in 6 volunteers out of 8 suffering a cytokine storm. One of the 6 lost toes and fingers. The last two? They had only received placebos!

en.wikipedia.org/wiki/Theralizu…
As a consequence, the Oxford Jenner Institute had trouble recruiting volunteers for their trials. One of my fellow @MertonCollege Fellows explained to me the science and the goals for their 2017 campaign and convinced me to participate.
Read 21 tweets
Sep 4, 2020
This work really makes a plethora of basic mistakes.

I will point at some below.
1/ Think SwissCovid leads to many calls to the hotline? That's good, right?

Well, you better know what those calls are for. Building a GAEN app is like building on the quicksand of the OS you are using.

Case in point, Apple:
2/ Think #SwissCovid efficacy is comparable to manual contact tracing? One paper cited is from May 2020, the other July but concerns data collected up until March 2020. Ages ago. We have learned a lot since about the virus, e.g. the heterogeneity of infectiousness ImageImageImageImage
Read 9 tweets
Mar 13, 2020
When Cummings sent his job ad, I wrote a thread on how revealing it was of his world view, and particularly that he would name check a journalist like @carolecadwalla
Indeed, Cummings understands the systemic level very well. But as I said then, he fails to understand how others build meaning and why that is important.
The UK virus strategy is to #FlattenTheCurve, like everywhere else, but with a twist. The epidemiologists operate under the assumption that the behavioral scientists ("nudge unit") are correct, and it is impossible to ask for strong efforts for more than a week.
Read 13 tweets
Jan 3, 2020
@carolecadwalla The fact that Cummings felt compelled to refer to you in this job ad is actually hugely relevant. He understands the systemic level very well, but fails at understanding how others make sense of the systemic, and why that is important.
@carolecadwalla He also fails to understand there is circularity in the particular context of elections (i.e. process of how the average p makes sense of what's best, given tons of influence). That circularity gives more legitimacy to your stories than his calculations as a way to build meaning
@carolecadwalla His job ad is largely obsessed with "causation" and "counterfactuals", which is a very narrow view of what "meaning" is. Meaning here are explanations that others can relate to, in the sense that they can come up with their own plan of how to relate to it.
Read 7 tweets
Jan 25, 2019
There is ABSOLUTELY a story here, documenting Facebook's resistance to the "Download your History" feature (yet their use of this White Whale for PR purposes right now, even yesterday at Davos by Sandberg) 1/n
This is not a frivolous request. The reason to ask is that this feature would make a lot of v interesting research much easier and potent (+granular, personal feedback). E.g. this research on Twitter would transfer over completely
I have known this for a while. In fact, as I was working in 2016 w/ @HNSGR on his @CamAnalytica uncovering, I became convinced neither journalism or research alone would cut it. User-centric data, in the spirit of @mydataorg could bridge the two motherboard.vice.com/en_us/article/…
Read 21 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(