I am reviewing this alleged hack of The Heritage Foundation.
I have identified very embarrassing data within this dataset. Why so many Chinese IP addresses? 🤔
The zipped file contains one single file:
"daily-signal_dev_database_new.sql"
This appears to be a combined set of exports from a SQL database. Here are the first lines
Because this is a combined export (likely from the command line) of various tables, the file is not readable by a typical SQL editor, and needs to be split into pieces to make it so.
I'd rather just turn it into CSV chunks to start cleaning up the dataset for further analysis
There are 215,000 lines or so in the WordPress Comments table. As you can see, comment_author_IP is available, which is broadly useful to get a sense of where people posting replies to the Heritage blog are coming from in the world.
Earliest date: 2008-01-04. Newest: 2022-11-09
After creating a CSV chunk with only the WP comments table, now I can view columns and extract their content as needed. After extracting IP addresses from the author column, I can eliminate duplicates and work on analyzing their presumed geo origin, which is of interest to me
Dataset was a little dirty and a hassle to clean up.
Here are the 60K extracted IPs from the WP Comments table:
#HeritageFoundation defuse.ca/b/PTrmvlbs
Sample geolocations from the first 100 IPs (these are sorted 'low to high', and many Asia-based netblocks start with the number 1)
Spies in the dataset
Here are the 69.5K email addresses present within the complete dataset:
🤔 235 .mil and .gov email addresses
🤔 95 .ru and .cn email addresses
#HeritageFoundationdefuse.ca/b/mLXCi0iXsGFj…
Linked below is a statistical breakdown of the domain names associated with all email addresses in the dataset.
Stacking and counting are basic analytical tools which can help analysts identify outliers.
defuse.ca/b/GMCj2uAfvELn…
I have a script running to grab geolocation information and will tweet when it finishes.
Those working at big companies with access to certain commercial tools can do this more quickly than I can.
Because the original host took the file down, you can now find it here:
This is a 368 MB .zip file which uncompresses to a single 1.94 GB flat file.
SHA256: 3dcc258331d9139a654402d20b756b57ca17228aa9e2f80a4b6451b96c8eac70tan-medieval-hornet-252.mypinata.cloud/ipfs/QmVwiYsr4…
The hacker group claiming responsibility for this action has released new information on their Telegram channel.
Here is the list of Administrators.
defuse.ca/b/ely6s7iwqpLF…
BREAKING: SiegedSec claims to have officially disbanded.
#HeritageFoundation
Updated download link
@CloudsEdgeArt1 I am the first person covering this.
@loudmog cloudflare-ipfs.com/ipfs/QmVwiYsr4…
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
