Talk by Chris Sweeney at #FAT2020 on "Reducing sentiment polarity for demographic attributes in word embeddings using adversarial learning," with @Maryam_Najafian.
There are several types of bias encoded in language models, and this paper focuses on sentiment bias, where certain identity terms encode a more positive sentiment than others. #FAT2020
Various papers have studied the different possible sources of this bias, and this paper focuses on the word vectors themselves. #FAT2020
In particular, they define sentiment polarity by using various positive and negative words to obtain a positive/negative sentiment axis, and look at where various identity terms fall on that axis when their word vectors are projected onto that axis. #FAT2020
A given identity term's sentiment score is where its embedding's projection lies on this axis. Goal is to reduce the polarization of a set of identity terms while preserving semantic meaning. #FAT2020
They use an adversarial technique, learning to minimize the distance between polarized/depolarized word vectors, while the adversary maximizes the error between sentiment polarity and groundtruth. #FAT2020
They evaluate whether the resulting embeddings are depolarized and the effect on fairness/accuracy in downstream tasks. They show that they reduce the polarity of names typically associated with different demographic groups. #FAT2020
Case study uses Equality Evaluation Corpus, and they show improvement according to Sentiment valence regression metric for different demographic categories. #FAT2020