Alex J. Champandard 🌻 Profile picture
Building tools and teams where humans ≫ machines. AI, ML, research & development. co-Founded #CreativeAI #⚘
Nov 17, 2023 4 tweets 1 min read
The OpenAI board's only legal responsibility (for which individuals are accountable) is ensuring AGI is built safely and broadly beneficial.

The only actions they could take without fear of legal consequences stems from AGI *not* being built safely... openai.com/our-structure
Image It got bad enough that they were forced to fire Altman, and they did it in such a way that much of the blame is directed towards him — in case there's public outcry and they risk being called out (or sued) for performing their fiduciary duties.
Apr 11, 2023 35 tweets 6 min read
With #generative systems in the spotlight, it's important to understand ©. Announcing v1.0 of my:

🚦COPYRIGHT TRAFFIC-LIGHT SYSTEM🚦

🟢 Green: Full Ownership / Assignment
🌕 Yellow: Exclusive Contracts
🟠 Orange: Non-Exclusive Licenses
🔴 Red: Fair Use & Exceptions

[1/5] 🧶 DISCLAIMER: This is designed as a high-level overview, not legal advice. Copyright is tricky, especially internationally, so you need your own lawyer to work out the details!

Even popular licenses have legal risks and caveats due to being untested in courts worldwide...
Mar 7, 2023 14 tweets 4 min read
UPDATE: About the Cease & Desist sent to @StabilityAI with a deadline March 1st: I got confirmation of receipt but no formal reply.

The C&D included many points detailing how training and distribution must be conducted to be EU compliant, and they did not have any answer. Image I have proceeded in good faith & assuming the best, but I now believe:
- Their involvement in SD 1.x, training of 2.x and likely 3.x is non-compliant.
- They know this is the case and are doing their best to cover up.
- Everything you hear from them is carefully crafted PR only.
Mar 7, 2023 5 tweets 2 min read
When they say #AiArt has no soul, this is what they mean. Only one of those kids (real or not) actually had success, and you can *see* it!

If you are able to reproduce the look of success with #StableDiffusion please post a reply below... Looking more closely, I think Success Kid also *ate* sand and still had that defiant look — like he's staring down the universe. "I'd do it again."
Mar 5, 2023 4 tweets 1 min read
⚖️ Rules For Ethical Open-Sourcing ⚖️

If there's a BigTech #AI API available via payment for the feature or model you want to open-source, then it's ethical to open-source yours! 💯

Rationale: Bad actors are already using the API and BigTech is already profiting from that. ⚖️ Rule #2 ⚖️

If your model is less than 10% better on average for a variety of commonly used benchmarks compared to existing open-source models, then it's ethical to open-source yours! 💯
Mar 5, 2023 5 tweets 3 min read
What's interesting about this proposal in the context of #generative and Copyright: the place with the most creator-friendly legislation can have jurisdiction. Under the Berne Convention, it's where infringement occurs—so that could be the most creator-friendly place you chose. That is to say, "allowing #generative AI research under ethical standards [...]" that require Copyright are already regulated by the most human-friendly, pro-creator, and jurisdiction is automatically (e.g. that's where data is hosted, service is used) as per Berne.
Jan 31, 2023 20 tweets 5 min read
That's a wrap folks, show's over! Diffusion models as *lossy* databases that can regenerate their training data:

"Diffusion models are less private than prior generative models [...] mitigating these vulnerabilities may require new advances in privacy-preserving training."
Jan 22, 2023 33 tweets 12 min read
I have successfully compiled and run GLM-130b on a local machine! It's now running in `int4` quantization mode and answering my queries.

I'll explain the installation below; if you have any questions, feel free to ask!
github.com/THUDM/GLM-130B 130B parameters on 4x 3090s is impressive. GPT-3 for reference is 175B parameters, but it's possible that it's over capacity for the data & compute it was trained on...

I feel like a #mlops hacker having got this to work! (Though it should be much easier than it was.)
Jan 22, 2023 6 tweets 2 min read
Anyone in Europe want to try this?

Write to GitHub, ask for contact details of their Data Protection Officer. If they refuse, explain it's mandated by GDPR to provide contact details. Ask DPO what's their policy on personally-identifying information under GDPR. Post response! Oh, it's hosted on Google. Do the same there!

If you go via Support, you'll probably have to ask three times for DPO contact because frontline Google Support is not GDPR aware, and will refuse a few times to see if you're serious.

Not sure if intentional or incompetence!
Jan 21, 2023 11 tweets 4 min read
I never realized just how fragile tokenization can be when you're crafting LLM prompts!

Say your model was trained to summarize with "\n\nTLDR:" and you decide to include an extra space after ": " so that the space is excluded from the generated output: it's different tokens. So the next sentence could be "This research ..." but the statistics would get messed up because of the extra space, as the tokenizer would have tokenized " This" to include the space before the capital.

I'm not sure this really is "engineering" it's more like prompt hacking...
Dec 5, 2022 6 tweets 1 min read
Alex's Rules of Automation

#1
If you automate something, it disappears until something breaks! Alex's Rules of Automation

#2
When automation eventually breaks, it's always something new (and annoying) and may even take longer to fix than doing it manually.
Dec 4, 2022 4 tweets 1 min read
An automated system took this photo. Cue the debate over whether it's copyrighted or not! ;-) "Someone carried the camera to the top of the hill and set up the trigger. Why don't they own the copyright?"
Nov 5, 2022 20 tweets 4 min read
It's been 36h since this thread, with many constructive discussions since!

One front I failed on goes something like this:
"I thought you were an AI coder. How come you want CoPilot to be withdrawn? Do you want to cancel large models?"

Let's unpack! 👇
First, there's no risk of CoPilot service being terminated and the technology abandoned. I don't want to see that and that's not the objective of their lawsuit either.

Second, I think medium- to large-models are absolutely worth pursuing technologically!
Nov 5, 2022 9 tweets 3 min read
The Chrome team is cutting support for the superior JPEG-XL codec in its browser — even before they enabled it!

The decision was made in secret under the direction of a single person who has conflicts of interests, and promotes the inferior AVIF alternative. AVIF is based on VP10 codec, like a successor to WEBP which is based on the VP8 codec. Google owns & controls VP10, so has interests in promoting it instead of superior alternatives.

This means there'll be ~50% more energy used, and thus carbon, for internet bandwidth. 🙄
Nov 4, 2022 4 tweets 2 min read
"The Right To Read Is The Right To Mine" was a campaign from ~2012-2015 to convince the public & legislators that machines should bypass copyright for data-mining.

IMHO we're at the next stage of this campaign, now for generative systems — should they act outside copyright? Articles like this one are at the tail end of the first pro-mining campaign and precursors to this new generative campaign?

It tries to establish that "reading by robots doesn’t count" and "infringement is for humans only".

ilr.law.uiowa.edu/print/volume-1… (via @GradySimon)
Nov 4, 2022 4 tweets 1 min read
If you're working at a generative company, and worried about the lawsuit against GitHub for their generative model, please take some comfort in the fact that I think they made *many* missteps — with either a serious lack of due care, or the intent to break the law. For instance, Google announced they had a similar code model and they didn't release it. They used it internally & measured a 6% improvement on productivity while they understand the legal and ethical implications.

(Could also be that Google wanted to see others get sued first?)
Nov 3, 2022 17 tweets 5 min read
Reading through the GitHub CoPilot litigation submitted; although it was pulled off quickly — it's a solid piece of work!

My assessment is that the defendants, GitHub, Microsoft and OpenAI are in a very bad position...
githubcopilotlitigation.com The documents show how Codex and CoPilot act like databases; they have three different examples of JS code that is recited verbatim — with mistakes — from licensed sources.

Including this debug code below isPrime(n):
Nov 3, 2022 4 tweets 2 min read
You know how hands & fingers are particularly difficult to generate?

Wouldn't it be funny if people having important conversations online (in the near future) used hand gestures in front of their faces, so both sides know it's not a #DeepFake.

Anchor: I'm sorry to ask Mr. President, but before this TV interview can proceed please make a creative gesture with your hands.

Pres: What?

Anchor: Well, in the last election multiple candidates were caught using DeepFakes to make them look & sound smarter than they are.
Nov 3, 2022 5 tweets 2 min read
In NVIDIA's new paper on #Diffusion Models, they show how more denoisers (for each stage) and more embeddings (text, image) helps with quality!

TL;DR: If you buy more GPUs, you get correct spelling too.
deepimagination.cc/eDiffi/ #AI #ML With so many different labs rushing to research and deploy this kind of technology, this will quickly turn into a race for more efficiency as different providers compete on costs too.
Oct 17, 2022 4 tweets 2 min read
When large language models are explicitly trained to use Python and look-up Wikipedia, we'll be entering scary territory for #InfoSec#AI! OpenAI engineers probably did this a few months ago, now frantically trying to make sure their Python sandboxed environments are sufficiently safe...
Oct 17, 2022 5 tweets 2 min read
It's amazing how this great paper about prompt engineering from August (arxiv.org/abs/2208.01626) is only really getting wide-spread attention now there are good open-source implementations:
- github.com/google/prompt-…
- github.com/bloc97/CrossAt…

Academic Impact: OSS or GTFO? Prompt-To-Prompt editing allows you to easily change your input text without needing to completely regenerating the image. This makes it much easier to control the diffusion!

Example from bloc97's GitHub, four seasons of the same scene: