Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Troy Hunt

@troyhunt

Apr 3, 2021 • 27 tweets • 7 min read • Read on X

@haveibeenpwned

I’ve had a heap of queries about this. I’m looking into it and yes, if it’s legit and suitable for @haveibeenpwned it’ll be searchable there shortly.

https://twitter.com/UnderTheBreach/status/1378314424239460352

On first review, it's an extensive data set with one file per country and a header row as follows:

phone,uid,email,first_name,last_name,gender,date_registered,birthday,location,hometown,relationship_status,education_last_year,work,groups,pages,last_update,creation_time

I actually couldn't find any of my own or my family's data in the Australia file which has 7.3M rows. Having said that, I'm hearing from other trustworthy sources that the data is legit and that seems a reasonable assumption to work on for now.

Email addresses are *very* scarce though; in that 7.3M record Aussie file, there are only 47k occurrences of "@". The Italian file is the largest with nearly 36M records and there are 440k "@" chars in there. On that basis, there will be millions of addresses in the data set.

So, I'll extract those addresses, do some further verification then load the data. It won't be hundreds of millions of records, I suspect it'll be less than 10M, but obviously that's still a substantial number.

And no, I have no intention of adding phone number search in the foreseeable future. There's a User Voice suggestion for that and a comment from me which boils down to "much higher work and much lower value": haveibeenpwned.uservoice.com/forums/275398-…

https://twitter.com/tonyszko/status/1378422232628727817

I like the comment in this tweet. If we look at the data, email is rare, DoB is rare so the greatest impact here is the phone numbers. Even though it’s “only” 20% of FB users, the number is obviously substantial thus so is the impact

https://twitter.com/tonyszko/status/1378422232628727817

Another interesting data point on this: there are only 108 files with each representing a country therefore many countries are missing including Norway, Sweden, Denmark and Iceland, but Finland is in there. It's not clear why.

Here's the complete list of files in the corpus of data I was sent. If anyone has a different set, I'd be interested in hearing about it: gist.github.com/troyhunt/00b9a…

On closer inspection, all the files names are Italian. So Norway ("norvegia") is there as is Sweden ("svezia") and Denmark ("danimarca"). Sorry folks, tweeting as I go here.

Now that's clear, I'm finding a lot of friends from various places who've confirmed their exposed data. I haven't seen anything yet to suggest this breach isn't legit.

So what's the impact? For a targeted attack where you know someone's name and country, it's great for mobile phone lookup. Much harder to do en masse as there's no reliable key; I couldn't take a big list of emails and resolve them to phone numbers as email is rare in the data.

But for spam based on using phone number alone, it's gold. Not just SMS, there are heaps of services that just require a phone number these days and now there's hundreds of millions of them conveniently categorised by country with nice mail merge fields like name and gender.

@haveibeenpwned

Should the FB phone numbers be searchable in @haveibeenpwned? I’m thinking through the pros and cons in terms of the value it adds to impacted people versus the risk presented if it’s used to help resolve numbers to identities (you’d still need the source data to do that).

Factors influencing my consideration of this: only about 1% of the records have email addresses, the phone numbers are easily parsed (they’re in a CSV) and they’re formatted complete with country code. It’s a very clean data set and is 100x more useful than email in this case.

Another general observation on this incident: I'm seeing *extensive* sharing of the data, both the entire corpus of countries and individual country files. Not just in hacking circles, but very broadly on social media too. This data is everywhere already.

Email parsing now done, found 2,529,621 unique addresses across the 108 files. Call it about 0.5% of all records having an email address.

https://twitter.com/haveibeenpwned/status/1378554902100635659

That’s the email addresses loaded, I’m still considering what to do with the phone numbers

https://twitter.com/haveibeenpwned/status/1378554902100635659

I’m seeing a lot of anecdotal reports that people have received a marked uptick in spam calls and SMSs aligning with this incident. It’s very hard to attribute though; I get a heap of those too and my number isn’t in the data. That said, I expect the data will be abused.

@Blessf11

I see a lot of questions like this one from @Blessf11 and it’s always the same answer: the service that suffered the breach should provide the data that is circulating publicly to the rightful owner of it. FB, of all companies, has the resources to do this

https://twitter.com/Blessf11/status/1378616926008569856

After doing rather a large lot of processing and discovering 370M rows in the data set I was given some weeks ago then wondering why the headlines read 533M... I've been sent a separate set of files. This set aligns with more recent reporting: gist.github.com/troyhunt/9a081…

@haveibeenpwned

Which means that now I need to figure out the gaps and if it impacts the email addresses already loaded into @haveibeenpwned. It'll *definitely* impact the phone numbers, if I decide to load them.

Much of the data is same same but different; Albania, for example, begins with the same phone numbers and FB IDs but the original data was CSV whilst this lot is a colon delimited text file with a different field order.

This is really kludgy; 2nd data set has nowhere near the consistency of the 1st with colon delimiters, comma delimiters, headers, no headers, quote encapsulation, no quote encapsulation, different field orders, + before num, no + before num. Hackers have no attention to detail!

https://twitter.com/joetidy/status/1379142610946777094

The problem with this whole situation is that in a vacuum of information, people speculate. Facebook needs to make a clear statement on the data that’s in broad circulation; when it happened, where it came from and what’s in it. Without that, confusion and speculation reign

https://twitter.com/joetidy/status/1379142610946777094

@haveibeenpwned

The Facebook phone numbers are now being loaded into @haveibeenpwned and will be searchable later today. Stay tuned, I'll push out a short blog once it's good to go (will be queryable via the existing API too 😎).

Statement from Facebook on this incident: “Scraping data using features meant to help people violates our terms”. Well that fixes that! about.fb.com/news/2021/04/f…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @troyhunt

Troy Hunt

@troyhunt

Mar 13

Working with @Cloudflare pages is so cool, check out this workflow:

We have an open source repo for @haveibeenpwned's ux-rebuild which is here: github.com/HaveIBeenPwned/

Our front end oompa loompa just submitted a PR in the "privacy-page" branch: github.com/HaveIBeenPwned…

Read 7 tweets

Troy Hunt

@troyhunt

Jan 2

The Pornhub story regarding age verification shows just how hard privacy-preserving identifying verification is. Even when everyone agrees on the sentiment (nobody is saying kids should have access to porn), there’s no consensus on the execution. 404media.co/pornhub-is-now…

It took me a few seconds to VPN into Texas and capture these screens. It takes someone in Texas a few seconds to VPN into California and *not* see these screens! It costs a few bucks a month for a good VPN with loads of exit nodes around the world, placing you where you want.

I suspect that factored into Pornhub’s decision - the knowledge that they can satisfy a state law whilst not posing any real barrier to paying customers. If someone is willing to pay for porn, surely they’re willing to pay a lot less for a VPN to access it?

Read 7 tweets

Troy Hunt

@troyhunt

Oct 25, 2024

Was confused whilst doing my live stream just now why there was a sudden spike in DB usage on @haveibeenpwned. Turns out it was related to *dropping* this constraint:

ALTER TABLE [dbo].[Domain] ADD CONSTRAINT [CHK_DomainName_Pattern] CHECK (([dbo].[IsDomainValid]([DomainName])=(1)))

We'd decided a constraint that calls a function on every insert of a new domain was unnecessary; all it did was validate that the string adhered to the correct pattern, but because we controlled the upstream code, we could do that before it even hit the DB.

Read 5 tweets

Troy Hunt

@troyhunt

Oct 9, 2024

https://twitter.com/jwdomb/status/1844123760720548040

Hi folks, yes, I'm aware of this. I've been in communication with the Internet Archive over the last few days re the data breach, didn't know the site was defaced until people started flagging it with me just now. More soon.

https://twitter.com/jwdomb/status/1844123760720548040

https://x.com/sappliingg/status/1844135313733775366

Looks like someone compromised a polyfill JS file on a subdomain to inject the alert, but that doesn't explain the root site being down

https://x.com/sappliingg/status/1844135313733775366

https://x.com/brewster_kahle/status/1844133492453671192

Looks like a combination of things with the site being DDoS'd as well:

https://x.com/brewster_kahle/status/1844133492453671192

Read 9 tweets

Troy Hunt

@troyhunt

Oct 8, 2024

https://twitter.com/haveibeenpwned/status/1843780415175438817

This was a very uncomfortable breach to process for reasons that should be obvious from @josephfcox's article. Let me add some more "colour" based on what I found:

https://twitter.com/haveibeenpwned/status/1843780415175438817

Ostensibly, the service enables you to create an AI "companion" (which, based on the data, is almost always a "girlfriend"), by describing how you'd like them to appear and behave:

Buying a membership upgrades capabilities:

Read 21 tweets

Troy Hunt

@troyhunt

Sep 25, 2024

Another cool little @Cloudflare thing that snuck out recently is this very simple security.txt creator:

It's a simple form-based configuration that takes the basics of a security.txt file in the following interface:

Because @cloudflare sits in the middle of the traffic, they can then intercept requests to the appropriate path and serve up the file. Here's one I just created: troyhuntsucks.com/.well-known/se…

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Troy Hunt

Try unrolling a thread yourself!

More from @troyhunt

Troy Hunt

Troy Hunt

Troy Hunt

Troy Hunt

Troy Hunt

Troy Hunt

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!