The founder of @gridai_ (@_willfalcon) has been sharing some falsehoods about fastai as he promotes the pytorch lighting library. I want to address these & to share some of our fast ai history. 1/
Because our MOOC is so well-known, some assume fast.ai is just for beginners, yet we have always worked to take people to the state-of-the-art (including through our software library). 2/
I co-founded fast.ai w/ @jeremyphoward in 2016 & worked on it full-time until 2019. I've been focused on the USF Center for Applied Data Ethics for the past 2 yrs, so I can’t speak about the current as much, but I can share our history. 3/
In the very first fast.ai post, when we publicly launched in 2016, Jeremy highlighted the frustration of how deep learning software tools at the time were not up to where they need to be for deep learning to meet its potential. 4/
He laid out our goals, and 2 & 3 were all about tools and software. We always saw the course as a way to experience firsthand what the pain points were (for us and for others), in order to motivate the software development (again, this is from Oct 2016) 5/
I was frustrated by how cliquish deep learning was at the time (I first became interested in it in 2013), and even as a math PhD & data scientist, it was hard to get into the field since many researchers were not sharing the info needed 6/
In spring 2017, we implemented & taught neural style transfer, Wasserstein GANs in Pytorch, bidirectional LSTM, attentional models, 100 layers tiramisu, generative models, super-resolution, processing ImageNet in parallel, & more 7/
We were often implementing new papers as they came out during the course. One of the goals was for students to learn how to implement new papers on their own. We updated the course & library continually to keep up with state-of-the-art. The course was/is different every year 8/
The below statement is false (again, we taught & implemented GANs in spring 2017), but could just be a misunderstanding. However, when combined with apparently mimicking from the fastai library without attribution, it begin to look suspect. 9/
In 2018, a team of fast.ai students won a Stanford competition against better funded teams from Google & Intel. Winning this involved new research & implementations for mixed precision & distributed training 10/
In 2018, fastai was winning competitions with new research & implementations of mixed precision & distributed training, yet in 2019, Falcon claimed this was something new Lightning just innovated towardsdatascience.com/pytorch-lightn… 13/
The fastai library has been used in many research papers. Here are two partial lists 14/
There is also a common bias at work here: the false belief that you can’t be doing something to increase diversity/include outsiders AND being doing state-of-the-art work, as though these two are in conflict (they are not) 15/
I want to credit @GuggerSylvain for his crucial role in developing fastai. He helped make fastai what it is.
Also, thank you to the fastai community, and everyone who invested time & energy into building fastai & helping others, trying to create a welcoming & collaborative env
In aggregate, I don’t see how this could just be a “misunderstanding.” Falcon repeatedly made false statements about fastai, did not correct them when pointed out, still has not corrected them, & is trying to pass off aspects borrowed from fastai as new “innovations” he created
It is also important to note that Falcon raised $18 million in VC funding, likely using the “innovations” of Lightning as a key selling point. This isn't just about open source libraries 19/
Because the falsehoods about fastai were & are public (and have not been corrected), I feel that it is important to publicly address them. 20/
One more example: Falcon published a chart full of inaccuracies, said to comment if something was missing, and then ignored comments 21/
There were many features that fastai was incorrectly listed as not having (as well as many features fastai had & lightning didn’t that were not listed). 22/
In computational systems, we are often interested in unobservable theoretical constructs (eg "creditworthiness", "teacher quality", "risk to society"). Many harms are result of a mismatch between the constructs & their operationalization -- @az_jacobs@hannawallach
A measurement model is "valid" if the theoretical understanding matches the operationalization. There are many ways validity can fail.
Some types of Validity
content: does measurement capture everything we want
convergent: match other measurements
predictive: related to other external properties
hypothesis: theoretically useful
consequential: downstream societal impacts
reliability: noise, precision, stability
I made a playlist of 11 short videos (most are 7-13 mins long) on Ethics in Machine Learning
This is from my 2 hrs ethics lecture in Practical Deep Learning for Coders v4. I thought these short videos would be easier to watch, share, or skip around
What are Ethics & Why do they Matter? Machine Learning Edition
- 3 Case Studies to know about
- Is this really our responsibility?
- What is ethics? @scuethics
- What do we teach when we teach tech ethics? @cfiesler
Software systems have bugs, algorithms can have errors, data is often incorrect.
People impacted by automated systems need timely, meaningful ways to appeal decisions & find recourse, and we need to plan for this in advance
For example, if you start a “women & allies” email list and then fire a Black woman for being honest on it, it probably would have been better not to have the email list in the first place 3/
Some folks have asked about data vs. algorithms. Treating these as separate silos doesn't really make sense, and it contributes to a common perception that the data is someone else's problem, an unglamorous & lesser task:
Machine learning often fails to critique the origin, motivation, platform, or potential impact of the data we use, and this is a problem that we in ML need to address.
Q: Is AI development trapped in a paradigm that pursues efficiency above all else? @ResistanceAI
@Abebab cites ongoing work that finds efficiency, accuracy, & performance are the key values mentioned in most ML papers
Q: Is AI development trapped in a paradigm that pursues efficiency above all else?
@red_abebe: Efficient for whom? With example of criminal justice system, is it efficient to have 2 million in USA in prison?
Noopur Raval: The efficiency paradigm can show up in unexpected forms, including many projects claiming to be for social good. Technology can appear part of a mystical, deceptive promise to make things better.
This idea that you can't highlight problems without offering a solution is pervasive, harmful, and false.
Efforts to accurately identify, analyze, & understand risks & harms are valuable. And most difficult problems are not going to be solved in a single paper.
I strongly believe that in order to solve a problem, you have to diagnose it, and that we’re still in the diagnosis phase of this... Trying to make clear what the downsides are, and diagnosing them accurately so that they can be solvable is hard work -- @JuliaAngwin
With industrialization, we had 30 yrs of child labor & terrible working conditions. It took a lot of journalist muckraking & advocacy to diagnose the problem & have some understanding of what it was, and then the activism to get laws changed