Alex J. Champandard 🌱's Threads

Nov 17, 2023 • 4 tweets • 1 min read

The OpenAI board's only legal responsibility (for which individuals are accountable) is ensuring AGI is built safely and broadly beneficial.

The only actions they could take without fear of legal consequences stems from AGI *not* being built safely... openai.com/our-structure

It got bad enough that they were forced to fire Altman, and they did it in such a way that much of the blame is directed towards him — in case there's public outcry and they risk being called out (or sued) for performing their fiduciary duties.

May 5, 2023 • 10 tweets • 3 min read

Hey @MosaicML. If you fine-tune on books3 full of 197,000 Copyrighted works, are you sure you have legal rights to relicense the model as commercial?

Meta has been unable to license LLaMa because they know © law, not because they want to keep it private...

FYI @AuthorsGuild

https://twitter.com/vitaliychiley/status/1654495490426142726

@MosaicML Mis-licensing works when you don't have rights is Copyright Fraud; you can be liable whether you didn't know or do it intentionally.

If you checked with a lawyer & think it's OK, please share their argument. (Everyone else stays quiet because they know it's trouble.)

Apr 11, 2023 • 35 tweets • 6 min read

With #generative systems in the spotlight, it's important to understand ©. Announcing v1.0 of my:

🚦COPYRIGHT TRAFFIC-LIGHT SYSTEM🚦

🟢 Green: Full Ownership / Assignment
🌕 Yellow: Exclusive Contracts
🟠 Orange: Non-Exclusive Licenses
🔴 Red: Fair Use & Exceptions

[1/5] 🧶

DISCLAIMER: This is designed as a high-level overview, not legal advice. Copyright is tricky, especially internationally, so you need your own lawyer to work out the details!

Even popular licenses have legal risks and caveats due to being untested in courts worldwide...

Mar 7, 2023 • 14 tweets • 4 min read

UPDATE: About the Cease & Desist sent to @StabilityAI with a deadline March 1st: I got confirmation of receipt but no formal reply.

The C&D included many points detailing how training and distribution must be conducted to be EU compliant, and they did not have any answer.

I have proceeded in good faith & assuming the best, but I now believe:
- Their involvement in SD 1.x, training of 2.x and likely 3.x is non-compliant.
- They know this is the case and are doing their best to cover up.
- Everything you hear from them is carefully crafted PR only.

Mar 7, 2023 • 5 tweets • 2 min read

When they say #AiArt has no soul, this is what they mean. Only one of those kids (real or not) actually had success, and you can *see* it!

If you are able to reproduce the look of success with #StableDiffusion please post a reply below...

https://twitter.com/ramsri_goutham/status/1633057464039129095

Looking more closely, I think Success Kid also *ate* sand and still had that defiant look — like he's staring down the universe. "I'd do it again."

Mar 5, 2023 • 4 tweets • 1 min read

⚖️ Rules For Ethical Open-Sourcing ⚖️

If there's a BigTech #AI API available via payment for the feature or model you want to open-source, then it's ethical to open-source yours! 💯

Rationale: Bad actors are already using the API and BigTech is already profiting from that. ⚖️ Rule #2 ⚖️

If your model is less than 10% better on average for a variety of commonly used benchmarks compared to existing open-source models, then it's ethical to open-source yours! 💯

Mar 5, 2023 • 5 tweets • 3 min read

What's interesting about this proposal in the context of #generative and Copyright: the place with the most creator-friendly legislation can have jurisdiction. Under the Berne Convention, it's where infringement occurs—so that could be the most creator-friendly place you chose.

https://twitter.com/bobehayes/status/1632146399692292097

That is to say, "allowing #generative AI research under ethical standards [...]" that require Copyright are already regulated by the most human-friendly, pro-creator, and jurisdiction is automatically (e.g. that's where data is hosted, service is used) as per Berne.

Jan 31, 2023 • 20 tweets • 5 min read

That's a wrap folks, show's over! Diffusion models as *lossy* databases that can regenerate their training data:

"Diffusion models are less private than prior generative models [...] mitigating these vulnerabilities may require new advances in privacy-preserving training."

https://twitter.com/Eric_Wallace_/status/1620449934863642624

Jan 22, 2023 • 33 tweets • 12 min read

I have successfully compiled and run GLM-130b on a local machine! It's now running in `int4` quantization mode and answering my queries.

I'll explain the installation below; if you have any questions, feel free to ask!
github.com/THUDM/GLM-130B 130B parameters on 4x 3090s is impressive. GPT-3 for reference is 175B parameters, but it's possible that it's over capacity for the data & compute it was trained on...

I feel like a #mlops hacker having got this to work! (Though it should be much easier than it was.)

Jan 22, 2023 • 6 tweets • 2 min read

Anyone in Europe want to try this?

Write to GitHub, ask for contact details of their Data Protection Officer. If they refuse, explain it's mandated by GDPR to provide contact details. Ask DPO what's their policy on personally-identifying information under GDPR. Post response!

https://twitter.com/L_macchiato/status/1616458042505310208

Oh, it's hosted on Google. Do the same there!

If you go via Support, you'll probably have to ask three times for DPO contact because frontline Google Support is not GDPR aware, and will refuse a few times to see if you're serious.

Not sure if intentional or incompetence!

Jan 21, 2023 • 11 tweets • 4 min read

I never realized just how fragile tokenization can be when you're crafting LLM prompts!

Say your model was trained to summarize with "\n\nTLDR:" and you decide to include an extra space after ": " so that the space is excluded from the generated output: it's different tokens. So the next sentence could be "This research ..." but the statistics would get messed up because of the extra space, as the tokenizer would have tokenized " This" to include the space before the capital.

I'm not sure this really is "engineering" it's more like prompt hacking...

Dec 5, 2022 • 6 tweets • 1 min read

Alex's Rules of Automation

#1
If you automate something, it disappears until something breaks! Alex's Rules of Automation

#2
When automation eventually breaks, it's always something new (and annoying) and may even take longer to fix than doing it manually.

Dec 4, 2022 • 4 tweets • 1 min read

An automated system took this photo. Cue the debate over whether it's copyrighted or not! ;-)

https://twitter.com/GatelyMark/status/1598995660124012544

"Someone carried the camera to the top of the hill and set up the trigger. Why don't they own the copyright?"

Nov 5, 2022 • 20 tweets • 4 min read

It's been 36h since this thread, with many constructive discussions since!

One front I failed on goes something like this:
"I thought you were an AI coder. How come you want CoPilot to be withdrawn? Do you want to cancel large models?"

Let's unpack! 👇

https://twitter.com/alexjc/status/1588295664915861505

First, there's no risk of CoPilot service being terminated and the technology abandoned. I don't want to see that and that's not the objective of their lawsuit either.

Second, I think medium- to large-models are absolutely worth pursuing technologically!

Nov 5, 2022 • 9 tweets • 3 min read

The Chrome team is cutting support for the superior JPEG-XL codec in its browser — even before they enabled it!

The decision was made in secret under the direction of a single person who has conflicts of interests, and promotes the inferior AVIF alternative.

https://twitter.com/nathan_wasson/status/1588703909060698113

AVIF is based on VP10 codec, like a successor to WEBP which is based on the VP8 codec. Google owns & controls VP10, so has interests in promoting it instead of superior alternatives.

This means there'll be ~50% more energy used, and thus carbon, for internet bandwidth. 🙄

Nov 4, 2022 • 4 tweets • 2 min read

"The Right To Read Is The Right To Mine" was a campaign from ~2012-2015 to convince the public & legislators that machines should bypass copyright for data-mining.

IMHO we're at the next stage of this campaign, now for generative systems — should they act outside copyright? Articles like this one are at the tail end of the first pro-mining campaign and precursors to this new generative campaign?

It tries to establish that "reading by robots doesn’t count" and "infringement is for humans only".

ilr.law.uiowa.edu/print/volume-1… (via @GradySimon)

Nov 4, 2022 • 4 tweets • 1 min read

If you're working at a generative company, and worried about the lawsuit against GitHub for their generative model, please take some comfort in the fact that I think they made *many* missteps — with either a serious lack of due care, or the intent to break the law. For instance, Google announced they had a similar code model and they didn't release it. They used it internally & measured a 6% improvement on productivity while they understand the legal and ethical implications.

(Could also be that Google wanted to see others get sued first?)

Nov 3, 2022 • 17 tweets • 5 min read

Reading through the GitHub CoPilot litigation submitted; although it was pulled off quickly — it's a solid piece of work!

My assessment is that the defendants, GitHub, Microsoft and OpenAI are in a very bad position...
githubcopilotlitigation.com The documents show how Codex and CoPilot act like databases; they have three different examples of JS code that is recited verbatim — with mistakes — from licensed sources.

Including this debug code below isPrime(n):

Nov 3, 2022 • 4 tweets • 2 min read

You know how hands & fingers are particularly difficult to generate?

Wouldn't it be funny if people having important conversations online (in the near future) used hand gestures in front of their faces, so both sides know it's not a #DeepFake.

https://twitter.com/GalaxyKate/status/1588210859196776449

Anchor: I'm sorry to ask Mr. President, but before this TV interview can proceed please make a creative gesture with your hands.

Pres: What?

Anchor: Well, in the last election multiple candidates were caught using DeepFakes to make them look & sound smarter than they are.

Nov 3, 2022 • 5 tweets • 2 min read

In NVIDIA's new paper on #Diffusion Models, they show how more denoisers (for each stage) and more embeddings (text, image) helps with quality!

TL;DR: If you buy more GPUs, you get correct spelling too.
deepimagination.cc/eDiffi/ #AI #ML

With so many different labs rushing to research and deploy this kind of technology, this will quickly turn into a race for more efficiency as different providers compete on costs too.

Oct 17, 2022 • 4 tweets • 2 min read

When large language models are explicitly trained to use Python and look-up Wikipedia, we'll be entering scary territory for #InfoSec ∩ #AI!

https://twitter.com/goodside/status/1581805503897735168

OpenAI engineers probably did this a few months ago, now frantically trying to make sure their Python sandboxed environments are sufficiently safe...

Share this page!

Enter URL or ID to Unroll