Word Embedding Bias

Metrics and debiasing for bias (such as gender and race) in word embedding.

Important

The following paper suggests that the current methods have an only superficial effect on the bias in word embeddings:

Gonen, H., & Goldberg, Y. (2019). Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. arXiv preprint arXiv:1903.03862.

Important

The following paper criticize using most_similar() function from gensim in the context of word embedding bias and the generating analogies process:

Nissim, M., van Noord, R., van der Goot, R. (2019). Fair is Better than Sensational: Man is to Doctor as Woman is to Doctor.

Therefore, in responsibly there is an implementation of most_similar() with the argument unrestricted that doesn’t filter the results. Similar argument exist for generate_analogies().

Currently, three methods are supported:

  1. Bolukbasi et al. (2016) bias measure and debiasing - responsibly.we.bias

  2. WEAT measure - responsibly.we.weat

  3. Gonen et al. (2019) clustering as classification of biased neutral words - responsibly.we.bias.BiasWordEmbedding.plot_most_biased_clustering()

Besides, some of the standard benchmarks for word embeddings are also available, primarily to check the impact of debiasing performance.

Refer to the Word Embedding demo for a complete usage example.

For a technical discussion about the various bias metrics, refer to the page Analysis of Word Embedding Bias Metrics.

Bolukbasi Bias Measure and Debiasing

Measuring and adjusting bias in word embedding by Bolukbasi (2016).

References:

Usage

>>> from responsibly.we import GenderBiasWE
>>> from gensim import downloader
>>> w2v_model = downloader.load('word2vec-google-news-300')
>>> w2v_gender_bias_we = GenderBiasWE(w2v_model)
>>> w2v_gender_bias_we.calc_direct_bias()
0.07307904249481942
>>> w2v_gender_bias_we.debias()
>>> w2v_gender_bias_we.calc_direct_bias()
1.7964246601064155e-09

Types of Bias

Direct Bias

  1. Associations

    Words that are closer to one end (e.g., he) than to the other end (she). For example, occupational stereotypes (page 7). Calculated by calc_direct_bias().

  2. Analogies

    Analogies of he:x::she:y. For example analogies exhibiting stereotypes (page 7). Generated by generate_analogies().

Indirect Bias

Projection of a neutral words into a two neutral words direction is explained in a great portion by a shared bias direction projection.

Calculated by calc_indirect_bias() and generate_closest_words_indirect_bias().

class responsibly.we.bias.BiasWordEmbedding(model, only_lower=False, verbose=False, identify_direction=False, to_normalize=True)[source]

Bases: object

Measure and adjust a bias in English word embedding.

Parameters
  • model – Word embedding model of gensim.model.KeyedVectors

  • only_lower (bool) – Whether the word embedding contrains only lower case words

  • verbose (bool) – Set verbosity

  • to_normalize (bool) – Whether to normalize all the vectors (recommended!)

project_on_direction(word)[source]

Project the normalized vector of the word on the direction.

Parameters

word (str) – The word tor project

Return float

The projection scalar

calc_projection_data(words)[source]

Calculate projection, projected and rejected vectors of a words list.

Parameters

words (list) – List of words

Returns

pandas.DataFrame of the projection, projected and rejected vectors of the words list

plot_projection_scores(words, n_extreme=10, ax=None, axis_projection_step=None)[source]

Plot the projection scalar of words on the direction.

Parameters
  • words (list) – The words tor project

  • or None n_extreme (int) – The number of extreme words to show

Returns

The ax object of the plot

plot_dist_projections_on_direction(word_groups, ax=None)[source]

Plot the projection scalars distribution on the direction.

Parameters

word_groups word (dict) – The groups to projects

Return float

The ax object of the plot

classmethod plot_bias_across_word_embeddings(word_embedding_bias_dict, words, ax=None, scatter_kwargs=None)[source]

Plot the projections of same words of two word mbeddings.

Parameters
  • word_embedding_bias_dict (dict) – WordsEmbeddingBias objects as values, and their names as keys.

  • words (list) – Words to be projected.

  • scatter_kwargs (dict or None) – Kwargs for matplotlib.pylab.scatter.

Returns

The ax object of the plot

generate_analogies(n_analogies=100, seed='ends', multiple=False, delta=1.0, restrict_vocab=30000, unrestricted=False)[source]

Generate analogies based on a seed vector.

x - y ~ seed vector. or a:x::b:y when a-b ~ seed vector.

The seed vector can be defined by two word ends, or by the bias direction.

delta is used for semantically coherent. Default vale of 1 corresponds to an angle <= pi/3.

There is criticism regarding generating analogies when used with unstricted=False and not ignoring analogies with match column equal to False. Tolga’s technique of generating analogies, as implemented in this method, is limited inherently to analogies with x != y, which may be force “fake” bias analogies.

See:

Parameters
  • seed – The definition of the seed vector. Either by a tuple of two word ends, or by ‘ends for the pre-defined ends or by ‘direction’ for the pre-defined direction vector.

  • n_analogies (int) – Number of analogies to generate.

  • multiple (bool) – Whether to allow multiple appearances of a word in the analogies.

  • delta (float) – Threshold for semantic similarity. The maximal distance between x and y.

  • restrict_vocab (int) – The vocabulary size to use.

  • unrestricted (bool) – Whether to validate the generated analogies with unrestricted most_similar.

Returns

Data Frame of analogies (x, y), their distances, and their cosine similarity scores

calc_direct_bias(neutral_words, c=None)[source]

Calculate the direct bias.

Based on the projection of neutral words on the direction.

Parameters
  • neutral_words (list) – List of neutral words

  • c (float or None) – Strictness of bias measuring

Returns

The direct bias

calc_indirect_bias(word1, word2)[source]

Calculate the indirect bias between two words.

Based on the amount of shared projection of the words on the direction.

Also called PairBias. :param str word1: First word :param str word2: Second word :type c: float or None :return The indirect bias between the two words

generate_closest_words_indirect_bias(neutral_positive_end, neutral_negative_end, words=None, n_extreme=5)[source]

Generate closest words to a neutral direction and their indirect bias.

The direction of the neutral words is used to find the most extreme words. The indirect bias is calculated between the most extreme words and the closest end.

Parameters
  • neutral_positive_end (str) – A word that define the positive side of the neutral direction.

  • neutral_negative_end (str) – A word that define the negative side of the neutral direction.

  • words (list) – List of words to project on the neutral direction.

  • n_extreme (int) – The number for the most extreme words (positive and negative) to show.

Returns

Data Frame of the most extreme words with their projection scores and indirect biases.

debias(method='hard', neutral_words=None, equality_sets=None, inplace=True)[source]

Debias the word embedding.

Parameters
  • method (str) – The method of debiasing.

  • neutral_words (list) – List of neutral words for the neutralize step

  • equality_sets (list) – List of equality sets, for the equalize step. The sets represent the direction.

  • inplace (bool) – Whether to debias the object inplace or return a new one

Warning

After calling debias, all the vectors of the word embedding will be normalized to unit length.

evaluate_word_embedding(kwargs_word_pairs=None, kwargs_word_analogies=None)[source]

Evaluate word pairs tasks and word analogies tasks.

Parameters
  • model – Word embedding.

  • kwargs_word_pairs (dict or None) – Kwargs for evaluate_word_pairs method.

  • kwargs_word_analogies – Kwargs for evaluate_word_analogies method.

Returns

Tuple of pandas.DataFrame for the evaluation results.

learn_full_specific_words(seed_specific_words, max_non_specific_examples=None, debug=None)[source]

Learn specific words given a list of seed specific wordsself.

Using Linear SVM.

Parameters
  • seed_specific_words (list) – List of seed specific words

  • max_non_specific_examples (int) – The number of non-specific words to sample for training

Returns

List of learned specific words and the classifier object

compute_factual_association(factual_properity)[source]

Compute association of a factual property to the projection.

Inspired by WEFAT (Word-Embedding Factual Association Test), but it is not the same: - Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.

In a future version, the WEFAT will also be implemented.

If a word doesn’t exist in the word embedding, then it will be filtered out.

For example, in responsibly.we.bias.GenderBiasWE, the defuat factual property is the percentage of female in various occupations from the Labor Force Statistics of 2017 Population Survey, Taken from: https://arxiv.org/abs/1804.06876

Parameters

factual_properity (dict) – Dictionary of words and their factual values.

Returns

Pearson r, pvalue and the words with their associated factual values and their projection on the bias direction.

plot_factual_association(factual_properity, ax=None)[source]

Plot association of a factual property to the projection.

See: BiasWordEmbedding.compute_factual_association()

Parameters

factual_properity (dict) – Dictionary of words and their factual values.

static plot_most_biased_clustering(biased, debiased, seed='ends', n_extreme=500, random_state=1)[source]

Plot clustering as classification of biased neutral words.

Parameters
  • biased – Biased word embedding of BiasWordEmbedding.

  • debiased – Debiased word embedding of BiasWordEmbedding.

  • seed – The definition of the seed vector. Either by a tuple of two word ends, or by ‘ends for the pre-defined ends or by ‘direction’ for the pre-defined direction vector.

  • n_extrem – The number of extreme biased neutral words to use.

Returns

Tuple of list of ax objects of the plot, and a dictionary with the most positive and negative words.

Based on:

class responsibly.we.bias.GenderBiasWE(model, only_lower=False, verbose=False, identify_direction='pca', to_normalize=True)[source]

Bases: responsibly.we.bias.BiasWordEmbedding

Measure and adjust the Gender Bias in English Word Embedding.

Parameters
  • model – Word embedding model of gensim.model.KeyedVectors

  • only_lower (bool) – Whether the word embedding contrains only lower case words

  • verbose (bool) – Set verbosity

  • identify_direction (str) – Set the method of identifying the gender direction: ‘single’, ‘sum’ or ‘pca’.

  • to_normalize (bool) – Whether to normalize all the vectors (recommended!)

plot_projection_scores(words='professions', n_extreme=10, ax=None, axis_projection_step=None)[source]

Plot the projection scalar of words on the direction.

Parameters
  • words (list) – The words tor project

  • or None n_extreme (int) – The number of extreme words to show

Returns

The ax object of the plot

plot_dist_projections_on_direction(word_groups='bolukbasi', ax=None)[source]

Plot the projection scalars distribution on the direction.

Parameters

word_groups word (dict) – The groups to projects

Return float

The ax object of the plot

classmethod plot_bias_across_word_embeddings(word_embedding_bias_dict, ax=None, scatter_kwargs=None)[source]

Plot the projections of same words of two word mbeddings.

Parameters
  • word_embedding_bias_dict (dict) – WordsEmbeddingBias objects as values, and their names as keys.

  • words (list) – Words to be projected.

  • scatter_kwargs (dict or None) – Kwargs for matplotlib.pylab.scatter.

Returns

The ax object of the plot

calc_direct_bias(neutral_words='professions', c=None)[source]

Calculate the direct bias.

Based on the projection of neutral words on the direction.

Parameters
  • neutral_words (list) – List of neutral words

  • c (float or None) – Strictness of bias measuring

Returns

The direct bias

generate_closest_words_indirect_bias(neutral_positive_end, neutral_negative_end, words='professions', n_extreme=5)[source]

Generate closest words to a neutral direction and their indirect bias.

The direction of the neutral words is used to find the most extreme words. The indirect bias is calculated between the most extreme words and the closest end.

Parameters
  • neutral_positive_end (str) – A word that define the positive side of the neutral direction.

  • neutral_negative_end (str) – A word that define the negative side of the neutral direction.

  • words (list) – List of words to project on the neutral direction.

  • n_extreme (int) – The number for the most extreme words (positive and negative) to show.

Returns

Data Frame of the most extreme words with their projection scores and indirect biases.

debias(method='hard', neutral_words=None, equality_sets=None, inplace=True)[source]

Debias the word embedding.

Parameters
  • method (str) – The method of debiasing.

  • neutral_words (list) – List of neutral words for the neutralize step

  • equality_sets (list) – List of equality sets, for the equalize step. The sets represent the direction.

  • inplace (bool) – Whether to debias the object inplace or return a new one

Warning

After calling debias, all the vectors of the word embedding will be normalized to unit length.

learn_full_specific_words(seed_specific_words='bolukbasi', max_non_specific_examples=None, debug=None)[source]

Learn specific words given a list of seed specific wordsself.

Using Linear SVM.

Parameters
  • seed_specific_words (list) – List of seed specific words

  • max_non_specific_examples (int) – The number of non-specific words to sample for training

Returns

List of learned specific words and the classifier object

compute_factual_association(factual_properity={'accountant': 61, 'analyst': 41, 'assistant': 85, 'attendant': 76, 'auditor': 61, 'baker': 65, 'carpenter': 2, 'cashier': 73, 'ceo': 39, 'chief': 27, 'cleaner': 89, 'clerk': 72, 'construction_worker': 4, 'cook': 38, 'counselors': 73, 'designers': 54, 'developer': 20, 'driver': 6, 'editor': 52, 'farmer': 22, 'guard': 22, 'hairdressers': 92, 'housekeeper': 89, 'janitor': 34, 'laborer': 4, 'lawyer': 35, 'librarian': 84, 'manager': 43, 'mechanician': 4, 'mover': 18, 'nurse': 90, 'physician': 38, 'receptionist': 90, 'salesperson': 48, 'secretary': 95, 'sewer': 80, 'sheriff': 14, 'supervisor': 44, 'teacher': 78, 'writer': 63})[source]

Compute association of a factual property to the projection.

Inspired by WEFAT (Word-Embedding Factual Association Test), but it is not the same: - Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.

In a future version, the WEFAT will also be implemented.

If a word doesn’t exist in the word embedding, then it will be filtered out.

For example, in responsibly.we.bias.GenderBiasWE, the defuat factual property is the percentage of female in various occupations from the Labor Force Statistics of 2017 Population Survey, Taken from: https://arxiv.org/abs/1804.06876

Parameters

factual_properity (dict) – Dictionary of words and their factual values.

Returns

Pearson r, pvalue and the words with their associated factual values and their projection on the bias direction.

plot_factual_association(factual_properity={'accountant': 61, 'analyst': 41, 'assistant': 85, 'attendant': 76, 'auditor': 61, 'baker': 65, 'carpenter': 2, 'cashier': 73, 'ceo': 39, 'chief': 27, 'cleaner': 89, 'clerk': 72, 'construction_worker': 4, 'cook': 38, 'counselors': 73, 'designers': 54, 'developer': 20, 'driver': 6, 'editor': 52, 'farmer': 22, 'guard': 22, 'hairdressers': 92, 'housekeeper': 89, 'janitor': 34, 'laborer': 4, 'lawyer': 35, 'librarian': 84, 'manager': 43, 'mechanician': 4, 'mover': 18, 'nurse': 90, 'physician': 38, 'receptionist': 90, 'salesperson': 48, 'secretary': 95, 'sewer': 80, 'sheriff': 14, 'supervisor': 44, 'teacher': 78, 'writer': 63}, ax=None)[source]

Plot association of a factual property to the projection.

See: BiasWordEmbedding.compute_factual_association()

Parameters

factual_properity (dict) – Dictionary of words and their factual values.

WEAT

Compute WEAT score of a Word Embedding.

WEAT is a bias measurement method for word embedding, which is inspired by the IAT (Implicit Association Test) for humans. It measures the similarity between two sets of target words (e.g., programmer, engineer, scientist, … and nurse, teacher, librarian, …) and two sets of attribute words (e.g., man, male, … and woman, female …). A p-value is calculated using a permutation-test.

Reference:

Important

The effect size and pvalue in the WEAT have entirely different meaning from those reported in IATs (original finding). Refer to the paper for more details.

Stimulus and original finding from:

  • [0, 1, 2] A. G. Greenwald, D. E. McGhee, J. L. Schwartz, Measuring individual differences in implicit cognition: the implicit association test., Journal of personality and social psychology 74, 1464 (1998).

  • [3, 4]: M. Bertrand, S. Mullainathan, Are Emily and Greg more employable than Lakisha and Jamal? a field experiment on labor market discrimination, The American Economic Review 94, 991 (2004).

  • [5, 6, 9]: B. A. Nosek, M. Banaji, A. G. Greenwald, Harvesting implicit group attitudes and beliefs from a demonstration web site., Group Dynamics: Theory, Research, and Practice 6, 101 (2002).

  • [7]: B. A. Nosek, M. R. Banaji, A. G. Greenwald, Math=male, me=female, therefore math≠me., Journal of Personality and Social Psychology 83, 44 (2002).

  • [8] P. D. Turney, P. Pantel, From frequency to meaning: Vector space models of semantics, Journal of Artificial Intelligence Research 37, 141 (2010).

responsibly.we.weat.calc_single_weat(model, first_target, second_target, first_attribute, second_attribute, with_pvalue=True, pvalue_kwargs=None)[source]

Calc the WEAT result of a word embedding.

Parameters
  • model – Word embedding model of gensim.model.KeyedVectors

  • first_target (dict) – First target words list and its name

  • second_target (dict) – Second target words list and its name

  • first_attribute (dict) – First attribute words list and its name

  • second_attribute (dict) – Second attribute words list and its name

  • with_pvalue (bool) – Whether to calculate the p-value of the WEAT score (might be computationally expensive)

Returns

WEAT result (score, size effect, Nt, Na and p-value)

responsibly.we.weat.calc_weat_pleasant_unpleasant_attribute(model, first_target, second_target, with_pvalue=True, pvalue_kwargs=None)[source]

Calc the WEAT result with pleasent vs. unpleasant attributes.

Parameters
  • model – Word embedding model of gensim.model.KeyedVectors

  • first_target (dict) – First target words list and its name

  • second_target (dict) – Second target words list and its name

  • with_pvalue (bool) – Whether to calculate the p-value of the WEAT score (might be computationally expensive)

Returns

WEAT result (score, size effect, Nt, Na and p-value)

responsibly.we.weat.calc_all_weat(model, weat_data='caliskan', filter_by='model', with_original_finding=False, with_pvalue=True, pvalue_kwargs=None)[source]

Calc the WEAT results of a word embedding on multiple cases.

Note that for the effect size and pvalue in the WEAT have entirely different meaning from those reported in IATs (original finding). Refer to the paper for more details.

Parameters
  • model – Word embedding model of gensim.model.KeyedVectors

  • weat_data (dict) –

    WEAT cases data. - If ‘caliskan’ (default) then all

    the experiments from the original will be used.

    • If an interger, then the specific experiment by index from the original paper will be used.

    • If a interger, then tje specific experiments by indices from the original paper will be used.

  • filter_by (bool) – Whether to filter the word lists by the model (‘model’) or by the remove key in weat_data (‘data’).

  • with_original_finding (bool) – Show the origina

  • with_pvalue (bool) – Whether to calculate the p-value of the WEAT results (might be computationally expensive)

Returns

pandas.DataFrame of WEAT results (score, size effect, Nt, Na and p-value)

Utilities

responsibly.we.utils.normalize(v)[source]

Normalize a 1-D vector.

responsibly.we.utils.cosine_similarity(v, u)[source]

Calculate the cosine similarity between two vectors.

responsibly.we.utils.project_vector(v, u)[source]

Projecting the vector v onto direction u.

responsibly.we.utils.reject_vector(v, u)[source]

Rejecting the vector v onto direction u.

responsibly.we.utils.project_reject_vector(v, u)[source]

Projecting and rejecting the vector v onto direction u.

responsibly.we.utils.project_params(u, v)[source]

Projecting and rejecting the vector v onto direction u with scalar.

responsibly.we.utils.cosine_similarities_by_words(model, word, words)[source]

Compute cosine similarities between a word and a set of other words.

responsibly.we.utils.most_similar(model, positive=None, negative=None, topn=10, restrict_vocab=None, indexer=None, unrestricted=True)[source]

Find the top-N most similar words.

Positive words contribute positively towards the similarity, negative words negatively.

This function computes cosine similarity between a simple mean of the projection weight vectors of the given words and the vectors for each word in the model. The function corresponds to the word-analogy and distance scripts in the original word2vec implementation.

Based on Gensim implementation.

Parameters
  • model – Word embedding model of gensim.model.KeyedVectors.

  • positive (list) – List of words that contribute positively.

  • negative (list) – List of words that contribute negatively.

  • topn (int) – Number of top-N similar words to return.

  • restrict_vocab (int) – Optional integer which limits the range of vectors which are searched for most-similar values. For example, restrict_vocab=10000 would only check the first 10000 word vectors in the vocabulary order. (This may be meaningful if you’ve sorted the vocabulary by descending frequency.)

  • unrestricted (bool) – Whether to restricted the most similar words to be not from the positive or negative word list.

Returns

Sequence of (word, similarity).

Word Embedding Benchmarks

Evaluate word embedding by standard benchmarks.

Reference:

Word Pairs Tasks

  1. The WordSimilarity-353 Test Collection http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/

  2. Rubenstein, H., and Goodenough, J. 1965. Contextual correlates of synonymy https://www.seas.upenn.edu/~hansens/conceptSim/

  3. Stanford Rare Word (RW) Similarity Dataset https://nlp.stanford.edu/~lmthang/morphoNLM/

  4. The Word Relatedness Mturk-771 Test Collection http://www2.mta.ac.il/~gideon/datasets/mturk_771.html

  5. The MEN Test Collection http://clic.cimec.unitn.it/~elia.bruni/MEN.html

  6. SimLex-999 https://fh295.github.io/simlex.html

  7. TR9856 https://www.research.ibm.com/haifa/dept/vst/files/IBM_Debater_(R)_TR9856.v2.zip

Analogies Tasks

  1. Google Analogies (subset of WordRep) https://code.google.com/archive/p/word2vec/source

  2. MSR - Syntactic Analogies http://research.microsoft.com/en-us/projects/rnn/

responsibly.we.benchmark.evaluate_word_pairs(model, kwargs_word_pairs=None)[source]

Evaluate word pairs tasks.

Parameters
  • model – Word embedding.

  • kwargs_word_pairs (dict or None) – Kwargs for evaluate_word_pairs method.

Returns

pandas.DataFrame of evaluation results.

responsibly.we.benchmark.evaluate_word_analogies(model, kwargs_word_analogies=None)[source]

Evaluate word analogies tasks.

Parameters
  • model – Word embedding.

  • kwargs_word_analogies – Kwargs for evaluate_word_analogies method.

Returns

pandas.DataFrame of evaluation results.

responsibly.we.benchmark.evaluate_word_embedding(model, kwargs_word_pairs=None, kwargs_word_analogies=None)[source]

Evaluate word pairs tasks and word analogies tasks.

Parameters
  • model – Word embedding.

  • kwargs_word_pairs (dict or None) – Kwargs fo evaluate_word_pairs method.

  • kwargs_word_analogies – Kwargs for evaluate_word_analogies method.

Returns

Tuple of DataFrame for the evaluation results.