A major AI training data set contains millions of examples of personal data
Personally identifiable information has been found in DataComp CommonPool, one of the largest open-source data sets used to train image generation models.
Millions of images of passports, credit cards, birth certificates, and other documents containing personally identifiable information are likely included in one of the biggest open-source AI training sets, new research has found.
Thousands of images—including identifiable faces—were found in a small subset of DataComp CommonPool, a major AI training set for image generation scraped from the web. Because the researchers audited just 0.1% of CommonPool’s data, they estimate that the real number of images containing personally identifiable information, including faces and identity documents, is in the hundreds of millions. The study that details the breach was published on arXiv earlier this month.
The bottom line, says William Agnew, a postdoctoral fellow in AI ethics at Carnegie Mellon University and one of the coauthors, is that “anything you put online can [be] and probably has been scraped.”
The researchers found thousands of instances of validated identity documents—including images of credit cards, driver’s licenses, passports, and birth certificates—as well as over 800 validated job application documents (including résumés and cover letters), which were confirmed through LinkedIn and other web searches as being associated with real people. (In many more cases, the researchers did not have time to validate the documents or were unable to because of issues like image clarity.)
Having LOCAL control of one’s own data in one’s own network is superior to using “The Cloud” simply for the fact that allowing your control over your data to be accessible (and thus controllable) by unknown admins in unknown data centers is allowing an immutable vulnerability to be injected directly into the heart of your system.
What is “The Cloud”?
Someone else’s computers somewhere else, with someone else having unlimited access and control of them. ReconComputing.comOurWeb.io
“Clunky”? Retaining control over your own data while enabling ease of maintenance IS cybersecurity. With automatic filtering of known spamware & malware sites, ARKEN both boosts your network speeds AND reduces risks for users while making regulatory compliance & audit preparation easy!
ARKEN is literally plug & play.
Installation of our cybersecurity system automatically initiates a full inventory of all devices & processes within your network, identifying & alerting you to potential vulnerabilities & threats. Customizable I.R.P. allows users to easily manage & even prevent any malware install attempts in real-time. Do you know what’s on YOUR network?
The FBI and Department of Justice (DOJ) on June 30 said that almost $15 billion was reported in losses in the “largest health care fraud” investigation in U.S. history, with officials charging more than 300 people in connection with the alleged scheme.
In a post on social media platform X, FBI Director Kash Patel wrote that $14.6 billion in losses were incurred, while $245 million was seized, as FBI Deputy Director Dan Bongino said in a separate post on X that hundreds of people were charged in the case.
“Public corruption will not be tolerated as the Director and I vigorously pursue bad actors who violated their oaths to all of us,” Bongino said, describing the case as the “largest healthcare fraud investigation” in the country’s history.
SCOTUS: Justice Kagan’s Own Words Come Back to Haunt Her on Nationwide Injunctions. The Supreme Court’s 6-3 decision in Trump v. CASA, Inc., released Friday, finally put the brakes on the reckless abuse of nationwide injunctions by lower courts—and has Democrats in full meltdown mode. The left’s favorite judicial weapon just got neutered, and the hypocrisy is impossible to ignore.
The liberal wing of the court didn’t do itself any favors, either. Justice Ketanji Brown Jackson’s dissent was so horrible that Justice Amy Coney Barrett felt compelled to call it out in the majority opinion.
But Justice Elena Kagan’s credibility also took a direct hit. In a stunning display of judicial flip-flopping, Kagan’s own words from 2022 have come back to haunt her, exposing the left’s all-too-familiar habit of changing the rules when it suits their political objectives.
Nationwide injunctions have been the left’s go-to tactic for derailing conservative policy at the stroke of a single judge’s pen. Under Trump, district judges from deep-blue enclaves repeatedly issued sweeping orders to block administration policies nationwide at an unprecedented pace, no matter how tenuous the legal grounds.
HUGE WIN!
Supreme Court Sides With Energy Production in Latest NEPA Ruling: NEPA has become infamous for its role as a blockade to development in the United States. Signed into law on January 1, 1970, at fewer than six pages, NEPA’s original intent was to inform the public on the environmental impact of large projects through the publication of environmental reviews, called environmental impact statements. Over time, however, what began as a procedural safeguard has morphed into a burdensome processthat serves to stop projects through endless public commentary, hearings, bureaucratic delays, and legal challenges.
Source below 👇🏼
While there has been endless talk in Washington over the years about the need to reform NEPA, the Trump administration is taking decisive action. In response to the D.C. Circuit Court case Marin Audubon Society v. Federal Aviation Administration, which found that the Council on Environmental Quality (CEQ) lacks the statutory authority to issue binding rules on federal agencies, President Trump withdrew all CEQ NEPA guidance issued since 1977. These guidelines forced federal agencies to incorporate analyses — such as the consideration of climate impacts and environmental justice (added under the Biden administration) — into environmental reviews. These guidelines were not derived from the text of NEPA but had been added and modified over the decades to respond to judicial activism as well as the changing policy preferences of presidential administrations.
Congress has also recently made small efforts to address NEPA delays, passing a 2023 law limiting the page count and setting deadlines for environmental reviews. However, these were simply added on top of existing CEQ regulations and court precedents. While well intended, these changes fail to address the core issues with NEPA.
FDA halts new clinical trials seeking to send American cells to hostile labs for genetic engineering: The Food and Drug Administration (FDA) on Wednesday halted and ordered a review of all new clinical trials that includes sending American cells to hostile nations for genetic engineering purposes.
Source below 👇🏼
The order comes after the Biden administration finalized a data security rule last year that included allowing U.S. companies to send cells and other biological samples of Americans to other countries for processing as part of the FDA's clinical trials.
(Now you know why 23 & Me sold!)