Okay, time to live tweet my thoughts on @stanfordnlp @StanfordAILab's "Workshop on Foundation Models." A long thread.
First and foremost: please never use the phrase "foundational models" every again. It's a garbage name that people like @mmitchell_ai @emilymbender @mer__edith have criticized at length. I'll go find some of their comments and link to them later, but the short version is:
@mmitchell_ai @emilymbender @mer__edith 1. There is very little intellectually "foundational" about these models
2. It's not at all clear that GPT-3 and CLIP-DALL-E are the same kind of thing
3. The motivation for this relabeling appears to be entirely about political control over language
I missed @percyliang's intro talk, so I'll start with @jackclarkSF's

1. Jack says that everyone training or trying to train 100B+ language models are companies. This omits #EleutherAI and @BigscienceW from the narrative (note both groups were also excluded from the workshop)
2. Jack says that these models will obviously do good, but his examples are highly suspicious. He specifically positively raises examples like AI therapy chatbots which mental health experts are virtually universally against @MentalHealthAm
@MentalHealthAm 3. Jack is dead right about recommendation algorithms and their insidious impact on our behaviors.

4. I'm surprised by how critical he is willing to be about capital. Distinguishing between corporate and capital interests is an important and meaningful thing to do.
@MentalHealthAm We cannot forget that there are many people who are personally wealthy enough to fund the training of GPT-3. The CEO of any Fortune 500 company can do it without any meaningful impact on their lives.
If anyone reading this has 5M to spare, my DMs are open! I can train and release a completely open and free 200B language model with 5M in funding.

It's really funny to hear people talk about how GPT-3 is absurdly expensive. @mmitchell_ai compared it to @CERN a couple days ago.
@mmitchell_ai @CERN I fear that people without experience with large scale science are missing important context of scale. I am employed by a US Government contracting firm. We have a saying: a billion here, a billion there, and pretty soon you're talking about real money.
@mmitchell_ai @CERN My company has received over 100x the amount of money it would cost to train GPT-3 for AI research. The USG doesn't want a freely available GPT-3. If they did, it would exist.

Reminder that the US military spends more money in AI research than the entire private sector combined.
@mmitchell_ai @CERN It's also important to keep in mind that "freely available" is a complicated concept at scale. GPT-J (coming soon on @huggingface) is approximately the largest model that current tech can deploy on consumer GPUs.

github.com/huggingface/tr…
@mmitchell_ai @CERN @huggingface When I talk about making a 100B model or a 200B model "freely available" what I really mean is that the weights are published online. For you to actually use it, you'd need to take it to a cloud provider and pay them to use it. But at least @Microsoft wouldn't have a monopoly
Jumping ahead to the current panel because I have strong emotions about this:

Making these models publicly available is a prerequisite for auditing them. The current paradigm of private models is fundamentally at odds with auditing. And this is deliberate!
These models are very expensive and very profitable. The companies that train these models explicitly and deliberately censor research that critically examines large language models. This includes work like @mmitchell_ai @timnitGebru's Stochastic Parrots
technologyreview.com/2020/12/04/101…
@mmitchell_ai @timnitGebru But I'm actually far more concerned about the censorship of security and interpretability research. A leaked internal email written by Carlini reveals that his recent paper was censored extensively by Google

reuters.com/article/us-alp…
@mmitchell_ai @timnitGebru Carlini's paper was about measuring memorization of training data. Forget about ethics and policy work for a second: if this kind of foundational research on how models work is censored, we have no hope of building a real understanding of how these models work let alone
meaningfully evaluate whether their use is a good idea. You can't talk about the ethics of technology if you don't know how the technology functions. And @GoogleAI doesn't want you to know how it functions.
Empowering this kind of work is explicitly a reason why #EleutherAI is working on building and releasing large language models: blog.eleuther.ai/why-release-a-…
And I'm happy to say that we've already begun to see it
arxiv.org/abs/2107.06499
arxiv.org/abs/2105.05241
arxiv.org/abs/2107.07691
I missed the name of the woman who is currently speaking, but she is 100% spot on about these models already being deployed in ways that harm people and that people don't want.
These models are already being used to monitor political dissidents.

These models are already being used to spy on people.

These models are already being used to send people to jail.
If your understanding of the harms that these models do doesn't start with "these models are currently being used to violate human rights" and go downhill from there, you're quite poorly calibrated.
Police use shotspotter AIs put people in jail. But these algorithms don't work, or worse are openly fraudulent. In one example, an employee manually reclassified and then changed the recorded location of a “gunshot” when contacted by the police and asked to look for it.
When the defense filed a Frye motion, the prosecution withdrew all shotspotter-based evidence rather than defend it's scientific validity: vice.com/en/article/qj8…
Someone just said "who would have thought about these things 12 years ago."

I hate to break it to you, but computer security, social impacts of technology, and ethics were not invented in 2010. LOTS of people were thinking about this 12 years ago, and even many years prior
Have you never watched a sci-fi movie? Seriously?

Speaking of which, here's a great exposé on algorithms pitched as “moneyball meets the minority report” by people who somehow didn’t realize that the cops are the bad guys in that movie.

projects.tampabay.com/projects/2020/…
This is a point I made on @mtlaiethics's panel recently:

@mtlaiethics @percyliang brings up "ethical red teaming." There's actually a whole community of "AI hackers," and at @defcon the AI security group @aivillage_dc recently ran the first "AI Ethics Bug Bounty" with @Twitter. @ErickGalinkin and @ruchowdh lead this effort

@mtlaiethics @percyliang @defcon @aivillage_dc @Twitter @ErickGalinkin @ruchowdh The panelists are correct that evaluation suits are insufficient to understand real-world deployment contexts. In other fields we do studies of deployed systems and publish them. Not so in ML, because companies won't let you. I've tried at my company and I know people who have
@mtlaiethics @percyliang @defcon @aivillage_dc @Twitter @ErickGalinkin @ruchowdh tried at @GoogleAI and @MSFTResearch.

If we want to make progress on this we need to figure out how to get the data out of companies and into the public eye. Because I promise you it exists. It's just not something we are allowed to talk about.
@mtlaiethics @percyliang @defcon @aivillage_dc @Twitter @ErickGalinkin @ruchowdh @GoogleAI @MSFTResearch Which reminds me I should probably make a disclaimer: My day job is doing work for @BoozAllen, a US military contractor commonly referred to as the largest private intelligence agency in the world. Booz and the US government in no way condone or support anything I say
and would probably specifically disown most of it. Booz does a lot of classified work, and nothing I say should be considered to be a comment on any non-public programs at Booz or the US government, be it classified or otherwise.
Basically everyone who work in cutting edge AI research is muzzled in one way or another, and I am no different. This is one of the major reasons to democratize public discussion IMO... I can say things about Stanford and Google that employees there can't, and other people can
say things about my employer that I can't. We must wear these muzzles to get access to technology and ideas that enable us to function as researchers. That's how the ML world works right now.
Anyways, back to the panel:

You cannot rely on the US government to regulate AI technology. For over a decade the reality of the world has been that people who are rich are above the law. You set aside funds to pay token judgements, and then you go profit off violating US Law
If the US government is going to regulate AI technology in more than in name, the very first thing that has to happen is that the *minimum* judgement against a company for violating US regulatory law is the total profits accrued by that company. And then you need to pile punitive
fines on top of that. If that doesn't happen, regulation is a joke. This isn't a uniquely ML problem, for example the Sackler family made 13 BILLION dollars creating the opioid crisis.

wsj.com/articles/sackl…
It currently looks like they'll be paying approximately 4 billion dollars, and in exchange they'll get complete immunity for their blatant and deliberate illegal behavior that has killed tens of thousands of people

npr.org/2021/06/02/100…
Participatory approaches in AI are awesome, but they're not the solution to harm. Read this phenomenal paper about it: arxiv.org/abs/2007.02423

I also make remarks about this in my paper at a NeurIPS workshop last year with @wjscheirer: proceedings.mlr.press/v137/biderman2…
@wjscheirer It's interesting to hear alignment finally come out. I was wondering about this. I'm actually currently writing a position paper on the behalf of #EleutherAI about our attitudes about the release of technology.
I am hoping to have that out by the end of the month, though my migraines have already caused significant delays.

Anyways, I really want to see more communication and collaboration between what I'll call the "AI ethics" and the "AI alignment" communities. Neither community
is that positively disposed towards the other in my experience, and I think that's a major shame. This is something I am trying to work on within #EleutherAI, which is broadly speaking aligned to the alignment community rather than the ethics one.
It's cute to listen to people talk about "countries that don't have the same attitudes towards human rights" given the widespread abuse of human rights done by the US government.
LAPD Assistant Chief Horace Frank said “it is no secret” that the LAPD uses facial recognition, that he personally testified to that fact before the Police Commission a couple years ago, and that the more recent denials were "just mistakes."
latimes.com/california/sto…
SF banned the use of facial recognition technology for policing, so SF Cops pretend it’s okay to have other people use banned AI tech for them: sfchronicle.com/bayarea/articl…
Going back to @shotspotter, no independent audit or study has ever supported it's claims. A MacArthur Center for Justice study found 86% of gunshot reports resulted in no evidence of a crime: endpolicesurveillance.com
@shotspotter And yet it's deployed widespread in the US. My mom is a public defender in DC, and she says the majority of cases she sees have "evidence" provided by @shotspotter
@shotspotter Yes, China is a human rights abuser. But so is the US, both internally and externally. It's extremely hard to get meaningful data on how often the US military murders non-combatants abroad, but it looks like it's at least two per day on average: watson.brown.edu/costsofwar/fil…
This is Merc. He’s really cute, but he doesn’t seem to want to let me tweet
I tried and failed to find a clip of this actually

I'm getting a lot of messages and replies. I'm going to try to get to everyone, but it will take a while. I'm not ignoring you, and don't hesitate to nudge me again.
I really need to write a blog post soon about my Rule 1 of technology: if it's the primary activity of the villian in a sci-fi book or movie, don't do it.

@shotspotter I like the question about how Yejin Choi got youtube data in violation of youtube's ToS. I'm really impressed that she was willing to straight up just answer that they violated Google's ToS. It's a lot more common to try to hide this fact behind weasel words.
@shotspotter In case anyone's curious, she's spot on about it being legal. Something widely ignored in AI research (including in the AI Ethics literature) is that ToS are basically pieces of paper. They are not legally binding, researchers *probably* have a pass to ignore them in the US
@shotspotter due to free use even when they do express actual legal rights, and website owners typically fundamentally don't care.
Researchers are also very quick to ignore inconvenient truths about their data. People knew for years that ImageNet contained child pornography. I knew this in 2016 and I wasn't even an AI researcher in 2016. But until it was a liability to the authors, it was fine.
A current example is the "OpenSubtitles" dataset. The OpenSubtitles dataset claims to be a public dataset of scripts of movies. Despite what the paper and the original website say, it is in no way in compliance with US copyright law.
As far as I know, the Pile is the only paper that uses the dataset and admits that the website is a massive copyright violation. This dataset was actually the motivation for section 6.5 in the Pile paper, where we draw three senses in which

arxiv.org/abs/2101.00027
a dataset is (ethically) available for public use. As I said, it's probably fair use to use the data so legally it's not at issue (in the US at least).
I would love to learn if I'm wrong about this. @mmitchell_ai @haydenfield @mer__edith @emilymbender do any of y'all know a paper or blog post that openly discusses the fact that some dataset authors lie about copyright compliance? With OpenSubtitles or anything else as examples?
@mtlaiethics @percyliang @defcon @aivillage_dc @Twitter @ErickGalinkin @ruchowdh I was in a rush and omitted @rharang from the list in this tweet. He was also a core person behind making this competition happen and is an awesome ML read teamer in his own right.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Stella Rose

Stella Rose Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @BlancheMinerva

3 Jul
Phenomenally interesting paper about how AI researchers talk about what they value in their research. Very glad the authors took the time to do this laborious but important work. I'm going to keep this in my desk so the next time I go on a rant about how ML is prescriptive [1/?]
rather than descriptive I can wack people who disagree with this paper 😛

I would actually go further than the authors of this paper do (I don't know if they disagree with what I'm about to say, but they didn't say it): I would say that corporate AI research [2/?]
is a propaganda tool that is actively and deliberately wielded to influence policy, regulation, and ethics conversations about technology. The very way mainstream AI research - even "AI Ethics" research - is framed obliviates consequences for the companies. [3/?]
Read 26 tweets
2 Jul
Great write up about the crazy cool art #EleutherAI members have been learning to coax out of GANs with CLIP! Credit assignment with stuff like this is hard, but @jbusted1 @RiversHaveWings @BoneAmputee and @kialuy are some of the people who have made this happen.
@jbusted1 @RiversHaveWings @BoneAmputee @kialuy They’ve been doing some visionary work with human-guided AI-generated art for the past two months, and it’s phenomenal that they’re starting to get the recognition they deserve. Several more people who either lack twitters or whose handles I don’t know deserve applause too
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(