1/7 Do word embeddings really say that man is to doctor as woman is to nurse? Apparently not. Check out this thread for a description of a short paper I co-wrote with Malvina Nissim and Rob van der Goot, available here: arxiv.org/abs/1905.09866#NLProc#bias
2/7 The original analogy codes, both w2v and gensim, implement a constraint whereby no input vector can be returned. In any query A:B :: C:D, one could never get an answer D such that D==A or D==B or D==C simply because the code does not allow it.
3/7 Therefore, for the query "man is to doctor as woman is to X", doctor cannot be returned as answer!
4/7 We modified the code to make it unrestricted and tested several classic (biased) examples as well as the original analogy set. Specific outcomes are shown and discussed in the paper. We saw that: (i) if you let it, B is almost always returned.
5/7 (ii) then, the analogy task doesn't work that well (no "queen" for "man is to king as woman is to X", but "king") and (iii) analogy-based biases have often been overemphasized in the literature.
6/7 This is not beneficial for dealing with the problem of actual biases in word embeddings. Moreover, these are usually not captured by the analogy task anyway (we agree that they are "party tricks" as stated by @yoavgo and @hila_gonen in this fine work: arxiv.org/abs/1903.03862)
7/7 If you're curious, you can try analogies using the unrestricted code through this simple demo: let.rug.nl/rob/embs/. We'd love to get comments on the paper, so if you have any, let us know! Paper link again: arxiv.org/abs/1905.09866#NLProc#bias
• • •
Missing some Tweet in this thread? You can try to
force a refresh