ok, I've now read the full NYT complaint filed this morning vs OpenAI and Microsoft. I'm impressed - it's future-focused around fair value for work vital to democracy. It also contains 220k pages of exhibits although the pages of Ex J stood out to me. more on that in a minute. /1
The complaint is a must-read imho, it's the only way to understand the alleged violations and the extent as to which the systems have been designed and tuned in order to generate certain output. It's filed in SDNY and it may well be a landmark case. /2
It's rooted in copyright law and the US Constitution and that's very much where it begins. /3
And as it notes there is a lot of money at stake. But it does well to look towards the future showing how violations used to create a substitute undermin existing (and future) biz models (including AI licensing) which fund the critical and costly journalism around the globe. /4
The complaint makes it clear early on that the goal in negotiations to license its content is to receive fair value and to help shepherd a future world with responsible AI and a healthy news ecosystem. /5
It cites a number of examples as to the human and financial cost that goes into journalism which can span multiple continents and require working through very challenging access limitations. /6
That cost in 2023 includes mass shootings, wars, terrorist attacks, elections, financial infrastructure and natural disasters around the world. There can be no debate as to what it is at stake here. /7
And with a clearly tied role for Microsoft, the complaint highlights abuses even in the most recent months. It shows this example of content lifted verbatim from a NYT report and then compares it to the approach taken by a search engine. /8
Here is the search comparison using Microsoft's own search engine. The difference in handling of copy from the content is immediately obvious and impossible to debate. /9
The complaint also steps through the preference and weighting used for sources with claims NYT-sourced content is more valued for training. And that undermining that real investment will undermine the entire market for journalism - including licensing it for future AI. /10
There are a number of examples in the complaint around weighting showing not all brands and content are equal but I found the overweighting of WebText2 as a pretty good example of how "high-quality" content is given preference. /11
And Google PageRank as one of the oldest approaches on the web to sorting authority across websites... here it's noted that nearly all of the few entries above NYT are social media so significantly less helpful to training a model. /12
So back to Exhibit J. Unlike the other 220k+ pages of exhibits documenting registered works, this exhibit contains 100 examples of alleged copyright violations with nearly identical content being outputted by ChatGPT. Again, it's impossible to argue with this. /13
Here are four examples. Again, the lawsuit includes one hundred of them. You get the point. I find this exhibit to be an incredibly powerful illustration for a lawsuit that will go before a jury of Americans. Again, it's impossible to argue with this. /14
And finally the lawsuit rips a gigantic hole into the presentation of OpenAI as a benevolent nonprofit. /15
and on top of this, it also systematically walks through Microsoft's role in facilitating and contributing to the alleged copyright. Side note from me: Microsoft has gained one trillion dollars in market cap in 2023. /16
Finally a quote from me on all of this that I supplied to news outlets earlier today. /17
Here is a link to the full complaint and all of the exhibits. I would start with the 69 page complaint and then skip to Exhibit J if interested and I were you. Cheers. /18 courtlistener.com/docket/6811704…
Day 2. A few comments after 2nd day of testimony from Mark Zuckerberg. FTC began with impeachment as Zuckerberg had said yesterday friends & family were only about 25% of Stories shared when instead it appears more in 63-73% range. I would hammer him on these, it's a pattern. /1
Remember, we've learned from MZ's deposition to SEC and many trips to Congress, he may say too much and seems to talk his way through problems. Speaking of... USvGoogle on the weight of contemporaneous statements is already a massive shadow over MZ. /2
I think MZ has a tell. He often says, "Well that is an interesting question" when asked about his prior contemporaneous statements on fairly obvious questions such as "Is it true that Facebook users like less ads in their feeds?" /3
with FTC's opening statement slides (109 of them over 86 minutes IYKYK)) now posting, I want to flag just a few of them worth amplifying. /1
These two statements from Judge Boasberg his denial of Meta's motion to dismiss last November will weigh heavily on Facebook imho. The evidence from both the Instagram deal and WhatsApp deal are damning considering just these two bullets. /2
This slide (and the next one) were interesting in getting internal reflections of Meta/Facebook forcing more ads into the Instagram experience. /3
FTC v Meta Day 1. Opening arguments for FTC laid out its case. As predicted, Meta tried to blow hole into market definition. This actually comes later in trial so not dwelling but will add some context at end. But first witness 1 was CEO Zuckerberg. Dead to rights on conduct. /1
Internal Facebook employee messages (some we've previously seen plus plenty more) make the Instagram deal clearly anticompetitive conduct imho. Exhibits may not post until Wed so my quotes are my best snapshots from messages in exhibits on screens. Relay with care. I tried. /2
Zuckerberg has testified for only 3 hrs of FTC's estimated 7hrs so he's back on stand tomorrow at 9:30am ET (remember, Careless People book said he hates mornings). FTC has been systematically laying out timeline of Facebook shift to mobile and acquisition of Instagram. /3
As Meta’s Andy Stone works overnight criticizing whistleblower testimony today on their role in China, let’s not forget Meta worked furiously thru billions in settlements to keep sealed it provided data access to 86,961 developers in China unsealed after court sanctions in 2023.
That slide is from their own internal audit. The one they promised the public and Congress in testimony then buried it including fighting to keep the forensic clean up artists aka auditors under seal, too, until an attorney said it in open courtroom. storage.courtlistener.com/recap/gov.usco…
Here is Stone’s statement this morning. He has a track record burying for his bosses so just think it’s important context when he tries to brush aside China. Thank you @HawleyMO for accountability here. nbcnews.com/tech/social-me…
Pretrial orders starting to give taste as to why WSJ reports Mark Zuckerberg is meeting Pres. Trump desperately trying to settle its FTC lawsuit 11 days from trial. Court just ordered Meta to release all internal discussions of "integrity" issues up until 2020. That's toxic. /1
Also included is evidence as to what appears to be Apple warning Facebook/Meta to address CSAM on WhatsApp chat groups. Remember, advertisers built this company investing hundreds of billions of dollars to support it. /2
On that note, we will also likely see the financials for WhatsApp which was acquired by Facebook for nearly $19B despite almost no revenues. The why this happened will be a key argument in the court room. /3
The American values of IP protection have been a cornerstone in the country’s innovative spirit and competitive edge over foreign adversaries. DCN focused on strong copyright protections in our comments filed for the AI Action Plan. Will share some thoughts here. /1
Weakening copyright protections, whether at home or abroad, threatens US economic growth and the global competitiveness. Importantly, this point is inclusive of content creators across all platforms. The invented "right to learn" by machines is BS spin from OpenAI and Google. /2
Simply put, AI firms must not use copyrighted content without consent or compensation, as this undermines fair competition and creator rights. And they should be required to disclose when they've used it without consent. /3