Many face recognition datasets have been taken down due to ethical concerns. In ongoing research, we found that this doesn't achieve much. For example, the DukeMTMC dataset of videos was used in 135 papers published *after* it was taken down in June 2019. freedom-to-tinker.com/2020/10/21/fac…
A major challenge comes from derived datasets. In particular, the DukeMTMC-ReID dataset is a popular dataset used for person re-identification and continues to be free for anyone to download. 116 of 135 papers that use DukeMTMC after its takedown actually use a derived dataset.
This is a widespread problem. MS-Celeb was removed due to criticism but lives on through MS1M-IBUG, MS1M-ArcFace, MS1M-RetinaFace… all still public. The original dataset is also available via Academic Torrents. One popular dataset, LFW, has spawned at least 14 derivatives.
Regulating research ethics is hard, and the machine learning community has only recently started thinking seriously about it. So it's important to empirically test whether current ways of regulating ethics are working. Our work suggests there's a long way to go.
The key challenge in our research was figuring out which papers actually use a dataset as opposed to merely mentioning or citing the dataset. We are currently writing up a paper where we describe how we achieve this and present many more findings.
Finally, an important caveat: we are able to study the use of dubious facial recognition datasets *for research* by looking at the papers that ultimately result. There is a whole world of questionable uses in industry and by governments, but we have no window into it.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Arvind Narayanan

Arvind Narayanan Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @random_walker

5 Oct
At Princeton CITP, we were concerned by media reports that political candidates use psychological tricks in their emails to get supporters to donate. So we collected 250,000 emails from 3,000 senders from the 2020 U.S. election cycle. Here’s what we found. electionemails2020.org
Let me back up: this is a study by @aruneshmathur, Angelina Wang, @c_schwemmer, Maia Hamin, @b_m_stewart, and me. We started last year by buying a list of all candidates running for federal and state elections in the U.S. We also acquired lists of PACs and other orgs.
Next, the key bit for data collection: we created a bot that was able to find these candidates’ websites through search engines, look for email sign up forms, fill them in, and collect the emails in a giant inbox. We verified manually that each step works pretty accurately.
Read 11 tweets
17 Sep
Expertise is important for scholars, but after 5-10 years the benefits of continuing to deepen your expertise are tiny compared to broadening it.

Universities are perfectly set up to prevent breadth of expertise by hiring people for life and putting them into siloed departments.
How have I not heard this before? It's a few weeks too late to use it on the students on PhD orientation day!
The intellectual superiority of depth over breadth is a pervasive fiction in academia that sustains the culture of fetishizing specialization. I tried to fight this culture early in my career, but realized it was like punching a bag of sand.
Read 5 tweets
26 Aug
An amazing benefit of my privilege is being able to say "I didn't understand that. Could you explain it again?" as many times as necessary without having to worry that people will think I'm stupid.
If you didn't understand something I said, please ask me as many times as necessary. In fact, I'm delighted when this happens. As a professor, knowing when something I explained didn't make sense is extremely valuable feedback that helps me do better.
I'm a tenured computer science professor who looks like what many people expect a tenured computer science professor to look like. The follow up I get after someone asks "So what do you do?" is nearly always "Oh, you must be really smart."
Read 4 tweets
30 Jul
By the same token, it should be a sobering moment for computer science academia. With few exceptions, work that tries to bring accountability to big tech companies is relegated to the fringes of our discipline. CS these days cozies up to power far more than speaking truth to it.
There's a lot of concern today about industry funding of specific researchers. That's important, but a 100x deeper problem is that the tech industry warps CS academia's concept of what is even considered a legitimate research topic. This influence is both pervasive and invisible.
Most of the industry influence happens without any money changing hands. Academia's dependence on industry data is one way. Another is that most grad students go on to industry jobs and naturally prefer to work on topics that increase their employability.
Read 6 tweets
27 Jul
Academia forces you to pay a "cleverness tax" if you want to succeed—it's a tax on your time that goes towards constantly convincing others that your work is clever enough for publication, getting a PhD, tenure, and promotion. It's one of the things that pushes people out.
Reviewer 3: I see you’ve solved global hunger, but it was always obvious that you could do that by working really hard, so we haven’t learned anything from your paper. Perhaps you could try solving global hunger using only purple foods? That would be novel.
The cleverness tax is higher for scholars whose work doesn’t fit their discipline’s stereotyped notions of what clever work is supposed to look like. You’re often forced to pick between having a real impact on the world and just staying in the game.
Read 4 tweets
24 Jul
I often criticize Twitter, but there are a few things I really appreciate about it, and one of them is threads. I think threads are a pretty cool way to write. Yes, it’s a form of lazy blogging, but I’ve found the laziness to be a virtue more often than it is a sin.
If you’ve tried blogging you know the feeling of staring at a blank page and searching for motivation to write with no idea of whether anyone will find your thoughts interesting. It’s much easier to write a tweet or two and decide to expand the thread if people are interested.
140 was a bit silly, but 280 is a decent length for a well crafted paragraph. Twitter forces me to practice making my text succinct, which has made me a better writer in general. That's great because I write for a living, like many others—even if we don't call ourselves writers.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!