Jason Kint Profile picture
Dec 27, 2023 19 tweets 7 min read Read on X
ok, I've now read the full NYT complaint filed this morning vs OpenAI and Microsoft. I'm impressed - it's future-focused around fair value for work vital to democracy. It also contains 220k pages of exhibits although the pages of Ex J stood out to me. more on that in a minute. /1 Image
The complaint is a must-read imho, it's the only way to understand the alleged violations and the extent as to which the systems have been designed and tuned in order to generate certain output. It's filed in SDNY and it may well be a landmark case. /2 Image
It's rooted in copyright law and the US Constitution and that's very much where it begins. /3 Image
And as it notes there is a lot of money at stake. But it does well to look towards the future showing how violations used to create a substitute undermin existing (and future) biz models (including AI licensing) which fund the critical and costly journalism around the globe. /4 Image
The complaint makes it clear early on that the goal in negotiations to license its content is to receive fair value and to help shepherd a future world with responsible AI and a healthy news ecosystem. /5 Image
It cites a number of examples as to the human and financial cost that goes into journalism which can span multiple continents and require working through very challenging access limitations. /6 Image
That cost in 2023 includes mass shootings, wars, terrorist attacks, elections, financial infrastructure and natural disasters around the world. There can be no debate as to what it is at stake here. /7 Image
And with a clearly tied role for Microsoft, the complaint highlights abuses even in the most recent months. It shows this example of content lifted verbatim from a NYT report and then compares it to the approach taken by a search engine. /8 Image
Here is the search comparison using Microsoft's own search engine. The difference in handling of copy from the content is immediately obvious and impossible to debate. /9 Image
The complaint also steps through the preference and weighting used for sources with claims NYT-sourced content is more valued for training. And that undermining that real investment will undermine the entire market for journalism - including licensing it for future AI. /10 Image
There are a number of examples in the complaint around weighting showing not all brands and content are equal but I found the overweighting of WebText2 as a pretty good example of how "high-quality" content is given preference. /11 Image
And Google PageRank as one of the oldest approaches on the web to sorting authority across websites... here it's noted that nearly all of the few entries above NYT are social media so significantly less helpful to training a model. /12 Image
So back to Exhibit J. Unlike the other 220k+ pages of exhibits documenting registered works, this exhibit contains 100 examples of alleged copyright violations with nearly identical content being outputted by ChatGPT. Again, it's impossible to argue with this. /13 Image
Here are four examples. Again, the lawsuit includes one hundred of them. You get the point. I find this exhibit to be an incredibly powerful illustration for a lawsuit that will go before a jury of Americans. Again, it's impossible to argue with this. /14


Image
Image
Image
Image
And finally the lawsuit rips a gigantic hole into the presentation of OpenAI as a benevolent nonprofit. /15 Image
and on top of this, it also systematically walks through Microsoft's role in facilitating and contributing to the alleged copyright. Side note from me: Microsoft has gained one trillion dollars in market cap in 2023. /16 Image
Finally a quote from me on all of this that I supplied to news outlets earlier today. /17 Image
Here is a link to the full complaint and all of the exhibits. I would start with the 69 page complaint and then skip to Exhibit J if interested and I were you. Cheers. /18 courtlistener.com/docket/6811704…
Here is exhibit J. It’s an incredibly powerful exhibit for a jury trial once you explain copyright. /19 storage.courtlistener.com/recap/gov.usco…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jason Kint

Jason Kint Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jason_kint

Jan 16
wow. This AI lawsuit against Facebook keeps getting worse as they reluctantly unseal documents on Court orders.
Check out this allegation. Not only two hrs before discovery cut-off but the Friday before we now know Mark Zuckerberg was deposed... /1 Image
Here is the bit from the newly filed and now unsealed third amended complaint. Allegations here Facebook used torrenting to download a pirated dataset to train LLaMA thereby also "seeding" pirated content globally. This is a BFD. /2 storage.courtlistener.com/recap/gov.usco…Image
In fact, it can be criminal so this count is in the third amended complaint and the state attorney general (hello, @AGRobBonta) should note these allegations here. As it relates to this case, it may also break their privilege claims since it alleged to further a crime. /3 Image
Read 8 tweets
Jan 14
wow. Upon Court order, incriminating exhibits were unsealed at 3:30am in an AI lawsuit against Meta. Once past a 'fake privilege,' it appears Zuckerberg approved the use of a highly controversial, pirated dataset.
Note OpenAI, too? AI companies with no ethics or guardrails. /1 Image
Here they acknowledge risk in media coverage, and massive EU fines, if "we have used a dataset we know to be pirated." So then you ask yourself the question, did they actually know it was pirated and use it? I uploaded docs - . /2 storage.courtlistener.com/recap/gov.usco…Image
To that question, here is how the internal project manager describes the dataset. Note the line (these are all my yellow highlights), "when sourced from copyrighted materials without the permission of copyright holders." /3 Image
Read 19 tweets
Jan 8
woah. This Friday? Too much moving on court dockets so I will surface for you. This matters, in this mega-Facebook case, as highly respected Chenault was Chairman of Facebook's board during its biggest scandals. WSJ reported he left board after disagreements with Zuckerberg. /1 Image
Here is the report on his departure, it includes reports of disagreements with Peter Thiel, too, over elections policies and "clashes" over moderation policies.
Btw, highly relevant to the last 24hrs of news. /2 wsj.com/articles/chena…Image
Moving on, Zuckerberg has also been noticed for deposition after "alleged wrongdoing on a truly colossal scale." He was already deposed last month in Hawaii for 7hrs. I would expect SEC closely compares transcripts to their 2019 depo which @zamaan_qureshi managed to unseal. /3 Image
Read 14 tweets
Dec 31, 2024
The secret deal (aka "Project Jedi Blue") between Google and Facebook has finally been unsealed in district court tonight. Link in text tweet. /1 Image
Here is a link to the full 48 pages of the agreement. /2 storage.courtlistener.com/recap/gov.usco…
Here is how NYT reported it in the complaint. Google and Facebook suggested it was misrepresented. Their proxies have misled public into thinking it was dismissed from lawsuit despite Google's CEO being deposed about it only months ago. /3 nytimes.com/2021/01/17/tec…
Read 5 tweets
Dec 31, 2024
woah. ~300 redacted summary judgment google exhibits posted in TX. I've uploaded all. most eye-popping - we finally get Google-Facebook contract (aka Jedi Blue) alleged as bid rigging (yes, press was misled, it's still part of the claims). /1 Image
If you need a definition for Match Rate, Google and Facebook include it with example of using the "encrypted blob" on mobile, feels very much like a fingerprint y'all. Here is the full contract, don't sleep on section dealing with monopoly enforcement. /2 storage.courtlistener.com/recap/gov.usco…Image
There are a ton of new exhibits from discovery with similar themes of Google secretly using projects to manipulate its black box auctions. "The first rule of Bernanke is we don't talk about Bernanke." /3 Image
Image
Read 20 tweets
Dec 20, 2024
Justice takes time. What he knew when. AOC will remember this, “Their lawsuit says Zuckerberg—facing the risk of personal liability over the data privacy scandal—got himself out of trouble by agreeing to pay a larger-than-necessary $5 billion fine with shareholder money.” 1/3 Image
Here is the full report from Bloomberg on Zuckerberg’s deposition which apparently was cut short and late on docs on Dec 3rd. Board members Thiel, Andreessen, others all being deposed these weeks. Press allowed Facebook to rewrite history on this. 2/3 news.bloomberglaw.com/business-and-p…
Here is a good thread that will get you into the details. Sheryl Sandberg also deposed although her assumed prior SEC deposition was sealed. We did finally get Zuckerberg’s which showed his nerves and that the scandal was on his mind much earlier. Thanks to @zamaan_qureshi 3/3
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(