Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

github.com/twitter-resear…
In fall 2020, Twitter users raised concerns that the automated image cropping system on Twitter favored light-skinned over dark-skinned individuals, as well as concerns that the system favored cropping woman's bodies instead of their heads

arxiv.org/abs/2105.08667
In order to address these concerns, they conduct an extensive analysis using formalized group fairness metrics

blog.twitter.com/engineering/en…
They find systematic disparities in cropping and identify contributing factors, including the fact that the cropping based on the single most salient point can amplify the disparities

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Philip Vollet

Philip Vollet Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @philipvollet

22 May
Where to find trending machine learning papers?

3 tools to find what's trending:
Find trending ArXiv papers on arxiv-sanity.com you can sort by categories and save for later reading
42papers a collaborative community to discover and read great papers together on the web.

42papers.com Image
Read 5 tweets
22 May
Why are graphs the future of biomedical research and what is the value of NLP here?

A small case study about:

How to speed up drug discovery with knowledge graphs and discover potential cures for diseases
In this case text mining is used to contextualize knowledge about:

- Genes
- Compounds
- Diseases
- Adverse drug effects
- Receptor bindings Image
Which text types are processed here? Medical literature, patient notes, electronic health records, clinical reports etc.

But how to start?

First you need to identify the different entities such as compounds, diseases, adverse drug effects and receptor bindings.
Read 9 tweets
20 May
Did you think bringing your machine learning model to production was the hard part?

What about model drift?

Now MLOps comes into play but how does it work and what are good tools?

What is:
- Continuous integration (CI)
- Continuous deployment (CD)
- Continuous training (CT) Image
The full MLOps life cycle

- Data Engineering: Get and clean the data recurring if necessary
- Model Engineering: Model training, evaluation, testing, and packaging
- Model Deployment: integrating the trained model. Model serving, performance monitoring
Why is MLOps important?

Just because your model is hitting now doesn't mean it will be doing so 6 months from now

Model drift is real!

- Continuous training (CT)
Read 10 tweets
16 May
Note taking apps are like muscle training - you have to do it every day.

How many times I have changed ...

From Evernote to OneNote to Google Keep to Notion and from Roam now to Obsidian

@obsdmd

Why?
Where the big ones like OneNote, Google Keep and Evernote fail is that the brain does not work like an index, thoughts are linked and associatively this is where the next generation of note taking apps show their strength.

Roam and Obsidian

roamresearch.com
Map your notes and thoughts into a graph and weave them together.

What bothered me about Roam is that it doesn't have a native client and only runs in the browser, and this is where Obsidian comes in!
Read 4 tweets
15 May
Your open source project is ready for deployment? Documentation is still missing?

Good documentation and its presentation is an art!

A case study with 4 examples on awesome documentation
What makes good documentation?

- No prosaic texts! Choose a practical approach with code snippets
- Good structure and overview with a quick entry then in depth
- Good search is everything
- Good code examples
A superbly executed documentation is the one by @explosion_ai about @spacy_io

Why?
-Extremely good search
-These diagrams eye candy everywhere!
-Interactivity
-Live code examples that can be customized and run in a Binder container

spacy.io
Read 14 tweets
13 May
Where to get data for your next machine learning project?

An overview of 8 amazing resources to accelerate your next project with data!

- Google Datasets
- Big Bad NLP Datasets
- Hugging Face Datasets
- Papers with Code Datasets
- Open Data on AWS
- Awesome Public Datasets
Hugging Face Datasets

Mainly for NLP but the good news Hugging Face is expanding and we can be sure that they will add datasets for visual machine learning soon!

@huggingface

huggingface.co/datasets
Big Bad NLP Datasets

One of the best sources for sophisticated Natural Language Processing datasets

@Quantum_Stat

datasets.quantumstat.com
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(