, 29 tweets, 8 min read Read on Twitter
1/ @AOC is completely on point. I used to work on face reco commercially (and having been on the wrong side of working on a biased model), I thought people might find helpful concrete examples of *some* of the ways face AI systems can come to be biased.
2/ Experts on Twitter have explained that the documented biases in face reco systems happen because of biased data. So I’m going to talk about: (1) where that (biased) data comes from. (2) How choices in bias measurement matter too.
3/ (I'll be speaking generally about different things that have happened in industry, and not anything specific to my employer. These are also my own opinions and do not represent the views of my employer.)
4/ Modern face recognition systems are trained on *very large* datasets. To train face identification, you want sets of multiple photos per person. For example, one of the largest published datasets, MF2 from UW, has 672K identities and 4.7M photos. (homes.cs.washington.edu/~kemelmi/ms.pdf).
5/ The data mostly comes from the web. For example, MF2 data is from Flickr. Another published dataset, MS-Celeb-1M (msceleb.org) is web-crawled, 10mill images. But "people whose photos are on the public internet" is not super representative of "people in the world."
6/ Internet access isn't evenly distributed across the world. Neither are researchers. You might crawl mostly the english speaking web/videos (remember, you need to associate multiple photos of a person.) Who are the people using Flickr?
7/ The MS Celeb paper points out that over 3/4 of their 1 million celebs are female- so who is famous on the web, could that involve what AOC calls "automated assumptions"? People constructing datasets usually do a bunch of things to adjust.
8/ One very expensive $$$$ option is to collect data by going out into the *real world* and paying people for it. It also takes a lot of work to do it correctly (ie where do you go), and do it in an ethical way (ie consent, fair compensation). But it can make a *huge* difference.
9/ There are other sources of data. Some people use mugshots (eek!- another example of how systematic societal bias might influence your dataset+annotations). Domestic firms in one non-US surveillance state get access to verrrry large government datasets.
10/ (And obviously if you are FB, you have access to data with a better distribution.)
11/ It isn't just where you sourced your data + annotations. Many datasets are further manually annotated by humans, often crowdsourcers (ie people who are paid piecemeal via online platforms, usually very little.)
12/ More human influence. What is the cultural context of our labelers? Are they being asked a question that is actually subjective? (age, emotion)
13/ Now, we have been taking it as true that biased input data means biased model. In reality, there are a lot of very cool technical approaches to try to account for issues with the distribution of your training data.
14/ But generally, to evaluate the biases and performance of your model, you will need some way to measure them. Otherwise you won’t know how (or *how much*) your model is biased.
15/ For example, you may need to decide to *create* a dataset which includes labels about people with various characteristics so you can test how your models perform on groups and subgroups.
16/ Bias measurement requires the perspective of people in communities that could be affected by your model about what problems are important, and also from the social sciences.
17/ For example, there are societal reasons why "my model is less accurate if you are wearing glasses" is less harmful to people than "my model performs worse if you are a person of color."
18/ Deciding what things to measure, constructing measurement datasets, criteria for releasing, are explicit *choices* you make building a system. So when AOC says "racial inequities that get translated, because algorithms are still made by human beings" she is totally correct.
19/ And researchers have documented the outcomes of doing this badly again and again. See: gendershades.org.
20/ It provides strong arguments for having a diverse group of people work on these problems. (A side note- this also means global diversity. For example, much of the top work in computer vision is built and consumed in China. All these topics have layers of global perspectives.)
21/ Back to AOC. One thing implied is AI ethics isn't one convo on "technical methods to reduce bias" and another on "ethics of applications." The questions intertwine, as biases seep into models, or models are used in ways they are not suited, or as feedback loops enhance bias.
22/ I think often of this quote in the Atlantic from german law enforcement investigating asylum claims. What happens when end users take probabilistic systems and ascribe to them the capabilities of not just humans, but of gods?
theatlantic.com/magazine/archi…
23/ I've tried to give a few practical examples of how things can go wrong with face, to help people contextualize this video. However, these kinds of ideas are really coming out of academia and public advocacy.
24/ There *so much* important scholarship happening in this area *right now*, where these ideas are coming from. We are all lucky to be able to learn from experts like @timnitGebru, @jovialjoy, @hannawallach, @mathbabedotorg, @jennwvaughan
25/ and so many more leaders in academia and activism who are developing ideas about how to document, understand, and reduce the impact of bias in AI, and the impact of AI on society.
26/ Also start here gendershades.org (mega impactful on industry practice and public awareness) and then here fatconference.org/index.html
27/ To close, a personal note. As someone who used to work on face, I feel very grateful for experts in this area, including the demonstration of serious issues on the kinds of systems I worked on (as well as creative approaches to fix.)
28/ p.s. Absolutely none of this thread is meant to represent the opinions of my employer.
p.p.s. I hope to see more legislators, beyond @AOC and @RonWyden, providing such sophisticated treatments of critically overlooked tech policy issues.
29/ p.s. There is a new paper out this week from Inioluwa Deborah Raji and Joy Buolamwini on the impact of the Gender Shades disclosure. Good reading.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Anna S. Roth
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!