I want to share some more details about what we found in our investigation into gambling data that are highly relevant to GDPR enforcement and privacy regulation at large.
For example, this is how companies share personal data with each other during a bunch of 'cookie syncs'.
I guess rarely anyone has ever analyzed the data flows during only a few 'cookie syncs' at such a level of detail.
It's not about 'cookies' but about an ongoing exchange of personal identifiers that many data companies use to recognize, track and follow everyone across websites.
The chart shows data transmissions we observed during only a few visits to skycasino.com, which initiated requests to the adtech firm MediaMath, which shared the ID it uses to recognize a user with many other firms and initiated further personal data processing by them.
- MediaMath sent its personal ID for the user to 14 third-party firms, 3 third parties sent their own IDs to MediaMath
- In addition, MediaMath directly or indirectly initiated extensive personal data processing by 19 third parties
- All 19 third parties *received* their own company-specific ID they stored in a cookie in the user's browser before
- 16 third parties also *stored* their company-specific ID in the browser
- We observed 17 firms processing the *same* ID during a visit to another website, Unibet
- In addition, several third parties exchanged IDs directly with each other, e.g. we observed the identity surveillance firm ID5 receiving IDs from 4 companies and sending its own ID to 2 companies
- Some firms stored the MediaMath ID in a cookie associated with their own domain
- As a side effect, several companies learned that the user visited skycasino.com, because requests included that data in the HTTP referer.
Some of them may use this data for profiling.
Most of them gained the capability to better track+follow people across the web.
All this personal data processing by MediaMath and the other companies occured because the skycasino.com website embedded third-party technology into its website.
As such, SkyBet/SBG directly or indirectly facilitates this personal data processing by third parties.
As soon as these third parties exploit their increased capability to follow people on the web or their knowledge about the fact that a user visited a specific site for their own purposes, the site may even (co)facilitate personal data processing in a completely different context.
This happened during a few visits to a single website. It happens hundreds of times a day for everyone of us, and billions of times across the web.
One firm sharing a single ID with another may seem meaningless, but at scale it turns into a commercial mass surveillance system.
The technical report contains a detailed analysis of the HTTP requests that were part of the data processing activities described above (p79).
The main report contains a summary of our findings on MediaMath's data processing (p47).
Here's an example that shows how MediaMath sends its ID retrieved from the user's browser to Salesforce, who can then match it with its own ID retrieved from the browser.
Salesforce also *stores* its ID in the browser, which it can later retrieve during visits to other websites.
Read about the core findings of our investigation in my other threads:
The technical report also contains a methodology section that describes how we observed and analyzed personal data flows in the web browser (p. 123ff).
We examined network traffic recorded during a series of 37 visits to 3 websites operated by Sky Bet and 10 visits to websites operated by other companies.
- New Win10 laptop connected to the Internet at a person's home in the UK, Chrome browser
- The person made manual website visits, no automation
- Network traffic was recorded with Chrome's developer tools and manually stored in HAR format after each visit
- I analyzed the resulting HAR/json files manually and with the help of grep, some custom Python code and Fiddler
I think, examining network traffic resulting from a series of website visits *over time* is a promising approach to investigate personal data flows, for EU authorities, for audit purposes, academic research - in addition to testing single visits or large-scale automated testing.
For example, I discovered how Google periodically replaces its 'IDE' identifier with a new version of the identifier across visits to websites operated by different companies.
Each time, you see 'response cookie' in a row, Google read the old identifier and stored a new version.
Google processed eight versions of its 'IDE' identifier between Feb 25 and Mar 9, 2021. We missed one replacement because we didn't record every visit.
The user who made these website visits didn't have a Google account. Nevertheless, Google is able to track them across the web.
Websites that embed Google/DoubleClick (co)facilitate Google's capability to track people across the web, especially if they let Google *store* new versions of the IDE identifier in the browser so G can access them during visits to other sites.
Other companies use a simpler approach to track and follow people across the web.
For example, TransUnion's marketing data firm Signal always uses the same ID. Every visit to a site that embeds Signal made the company both receiving its ID from the browser and storing it again.
We also observed Xandr, an adtech company and data broker previously owned by AT&T and now acquired by Microsoft, receiving the same personal ID from the user's browser and storing it again across every visit to websites operated by Sky Bet and by other companies in the same way.
Same with Adobe, who also received the same 'demdex' ID across different websites and stored it again, and with Iovation, another TransUnion company that is not in adtech but in identity verification and fraud detection and claims to track 7 billion 'consumer devices' globally.
We observed Iovation receiving the same identifier from the user's browser across visits to websites operated by several gambling companies, and storing it in the browser again.
Sky Bet facilitated the initial request to Iovation and thus the initial creation of the Iovation ID.
As laid out in the technical report, companies like Iovation, Signal, Xandr, Adobe or Google who constantly receive personal IDs and store them back to the browser would not be able to track people across the web if websites like skycasino.com would not facilitate that.
I am not a lawyer, assessing legal issues is @RaviNa1k's job. In my personal opinion, our findings suggest the GDPR data controller Sky Bet is jointly responsible for personal data processing by those companies that relies on their capability to recognize people across the web.
While Iovation, Signal, Adobe, Xandr and Google both received and stored the same ID in the user's browser across visits to websites operated by Sky Bet and by others, FB, Microsoft and MediaMath 'only' always received the same ID. But, they could also exploit data across sites.
Of course, all that data processing actually shouldn't happen anymore, at least not in the EU/UK almost four years into the GDPR. Enforcement failed for many reasons, including massive pseudo compliance efforts by the industry and under-resourced, spineless or even captured DPAs.
Anyway, I think one reason for why enforcement is still failing is a lack of understanding and evidence of how very limited, ephemeral and distributed personal data processing across many actors can scale to a massive violation of rights and freedoms.
Correction, the third sentence in the Salesforce box at the top left should read "Received the same Salesforce ID during..." instead of "Received the same Neustar ID during". Thanks @Bleibpassiv!
Here's a corrected version of the infographic (v1.1)
Because some were asking about 'consent' et al:
This is how the 'consent' banner looked like when we did the tests. They didn't even use a TCF-based CMP for pseudo-compliance. No 'decline' button, instead "by scrolling, clicking or navigating our site, you consent"
Meaningless.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I took another look at Snowden docs that mention browser/cookie IDs.
It's breathtaking how the surveillance marketing industry has still managed to claim for many years that unique personal IDs processed in the web browser are somehow 'anonymous', and sometimes still does.
Another 2011 doc indicates that the GCHQ operated a kind of probabilistic ID graph that aims to link cookie/browser IDs, device IDs, email addresses and other 'target detection identifiers' (TDIs) based on communication, timing and geolocation behavior:
Btw. What inspired me to revisit these docs is @ByronTau's book Means of Control, which not only details how US agencies buy commercial data from digital marketing but also provides deep historical context, tracing back to early-2000s debates on Total Information Awareness (TIA).
Die digitale Werbeindustrie verkauft Smartphone-Standortdaten und Bewegungsprofile von Millionen Menschen in Deutschland, darunter Privatpersonen und sensibles Personal.
Große Recherche von und BR, die einen riesigen Datensatz als "Muster" erhalten haben. netzpolitik.org
Sie haben Menschen identifiziert, die Entzugskliniken, Swinger-Clubs oder Bordelle besucht haben, aber auch Personal von Ministerien, Bundeswehr, BND, Polizei.
Fast alle Smartphone-Apps sind heute mit zwielichtigen Datensammeltechnologien "verwanzt".
Völlig unkontrollierte Datenmarktplätze, u.a. die Firma Datarade mit Sitz in Berlin, bieten Standort- und andere Verhaltensdaten über ganze Bevölkerungen aus vielen Ländern zum Verkauf an.
So, Microsoft exploits activity data from Outlook, Teams, Word etc across customers for its own promotional purposes, including on meetings, file usage and the seconds until emails are read.
Microsoft states that the analysis on the seconds until emails were read excludes EU data. Activity data from Outlook, Teams, Word etc, however, seems to include EU data.
What's their legal basis? This is also personal data on employees. And, are business customers fine with it?
Should cloud-based software vendors exploit personal data on users of their services, including private persons and employees of business customers, how they see fit?
I don't think so.
Not even for public-interest research, at least not without academic process and IRB review.
Some more findings from our investigation of LiveRamp's ID graph system (), which maintains identity records about entire populations in many countries, including name, address, email and phone, and aims to link these records with all kinds of digital IDs:crackedlabs.org/en/identity-su…
Identity data might seem boring, but if a company knows all kinds of identifying info about everyone, from home address to email to device IDs, it is in a powerful position to recognize persons and link profile data scattered across many databases, and this is what LiveRamp does.
LiveRamp aims to provide clients with the ability to recognize a person who left some digital trace in one context as the same person who later left some trace elsewhere.
It has built a sophisticated system to do this, no matter how comprehensive it can recognize the person.
As part of our new report on RTB as a security threat and previously unreported, we reveal 'Patternz', a private mass surveillance system that harvests digital advertising data on behalf of 'national security agencies'.
5 billion user profiles, data from 87 adtech firms. Thread:
'Patternz' in the report by @johnnyryan and me published today:
Patternz is operated by a company based in Israel and/or Singapore. I came across it some time ago, received internal docs. Two docs are available online.
Here's how Patternz can be used to track and profile individuals, their location history, home address, interests, information about 'people nearby', 'co-workers' and even 'family members', according to information available online:
, a 'social risk intelligence platform' that provides digital profiles about named individuals regarding financial strain, food insecurity, housing instability etc for healthcare purposes.
"It calculates risk scores for each risk domain for each person", according to the promotional video, and offers "clarity and granularity for the entire US".
Not redlining, though. They color it green.
Making decisions based on these metrics about individuals and groups seems to be highly questionable and irresponsible bs.