Word Embedding Bias¶

Metrics and debiasing for bias (such as gender and race) in word embedding.

Important

The following paper suggests that the current methods have an only superficial effect on the bias in word embeddings:

Gonen, H., & Goldberg, Y. (2019). Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. arXiv preprint arXiv:1903.03862.

Important

The following paper criticize using most_similar() function from gensim in the context of word embedding bias and the generating analogies process:

Nissim, M., van Noord, R., van der Goot, R. (2019). Fair is Better than Sensational: Man is to Doctor as Woman is to Doctor.

Therefore, in responsibly there is an implementation of most_similar() with the argument unrestricted that doesn’t filter the results. Similar argument exist for generate_analogies().

Currently, three methods are supported:

Bolukbasi et al. (2016) bias measure and debiasing - responsibly.we.bias
WEAT measure - responsibly.we.weat
Gonen et al. (2019) clustering as classification of biased neutral words - responsibly.we.bias.BiasWordEmbedding.plot_most_biased_clustering()

Besides, some of the standard benchmarks for word embeddings are also available, primarily to check the impact of debiasing performance.

Refer to the Word Embedding demo for a complete usage example.

For a technical discussion about the various bias metrics, refer to the page Analysis of Word Embedding Bias Metrics.

Check the WEFE (The Word Embeddings Fairness Evaluation Framework) package for additional word embeddings bias measures.

Bolukbasi Bias Measure and Debiasing¶

Measuring and adjusting bias in word embedding by Bolukbasi (2016).

References:

Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems (pp. 4349-4357).
The code and data is based on the GitHub repository: https://github.com/tolga-b/debiaswe (MIT License).
Gonen, H., & Goldberg, Y. (2019). Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. arXiv preprint arXiv:1903.03862.
Nissim, M., van Noord, R., van der Goot, R. (2019). Fair is Better than Sensational: Man is to Doctor as Woman is to Doctor.

Usage¶

>>> from responsibly.we import GenderBiasWE
>>> from gensim import downloader
>>> w2v_model = downloader.load('word2vec-google-news-300')
>>> w2v_gender_bias_we = GenderBiasWE(w2v_model)
>>> w2v_gender_bias_we.calc_direct_bias()
0.07307904249481942
>>> w2v_gender_bias_we.debias()
>>> w2v_gender_bias_we.calc_direct_bias()
1.7964246601064155e-09

Types of Bias¶

Direct Bias¶

Associations
Words that are closer to one end (e.g., he) than to the other end (she). For example, occupational stereotypes (page 7). Calculated by calc_direct_bias().
Analogies
Analogies of he:x::she:y. For example analogies exhibiting stereotypes (page 7). Generated by generate_analogies().

Indirect Bias¶

Projection of a neutral words into a two neutral words direction is explained in a great portion by a shared bias direction projection.

Calculated by calc_indirect_bias() and generate_closest_words_indirect_bias().

class responsibly.we.bias.GenderBiasWE(model, only_lower=False, verbose=False, identify_direction='pca', to_normalize=True)[source]¶

Measure and adjust the Gender Bias in English Word Embedding.

Parameters

model – Word embedding model of gensim.model.KeyedVectors
only_lower (bool) – Whether the word embedding contrains only lower case words
verbose (bool) – Set verbosity
identify_direction (str) – Set the method of identifying the gender direction: ‘single’, ‘sum’ or ‘pca’.
to_normalize (bool) – Whether to normalize all the vectors (recommended!)

plot_projection_scores(words='professions', n_extreme=10, ax=None, axis_projection_step=None)[source]¶

Plot the projection scalar of words on the direction.

Parameters

words (list) – The words tor project
or None n_extreme (int) – The number of extreme words to show

Returns

The ax object of the plot

plot_dist_projections_on_direction(word_groups='bolukbasi', ax=None)[source]¶

Plot the projection scalars distribution on the direction.

Parameters: word_groups word (dict) – The groups to projects
Return float: The ax object of the plot

classmethod plot_bias_across_word_embeddings(word_embedding_bias_dict, ax=None, scatter_kwargs=None)[source]¶

Plot the projections of same words of two word mbeddings.

Parameters

word_embedding_bias_dict (dict) – WordsEmbeddingBias objects as values, and their names as keys.
words (list) – Words to be projected.
scatter_kwargs (dict or None) – Kwargs for matplotlib.pylab.scatter.

Returns

The ax object of the plot

calc_direct_bias(neutral_words='professions', c=None)[source]¶

Calculate the direct bias.

Based on the projection of neutral words on the direction.

Parameters

neutral_words (list) – List of neutral words
c (float or None) – Strictness of bias measuring

Returns

The direct bias

generate_closest_words_indirect_bias(neutral_positive_end, neutral_negative_end, words='professions', n_extreme=5)[source]¶

Generate closest words to a neutral direction and their indirect bias.

The direction of the neutral words is used to find the most extreme words. The indirect bias is calculated between the most extreme words and the closest end.

Parameters

neutral_positive_end (str) – A word that define the positive side of the neutral direction.
neutral_negative_end (str) – A word that define the negative side of the neutral direction.
words (list) – List of words to project on the neutral direction.
n_extreme (int) – The number for the most extreme words (positive and negative) to show.

Returns

Data Frame of the most extreme words with their projection scores and indirect biases.

debias(method='hard', neutral_words=None, equality_sets=None, inplace=True)[source]¶

Debias the word embedding.

Parameters

method (str) – The method of debiasing.
neutral_words (list) – List of neutral words for the neutralize step
equality_sets (list) – List of equality sets, for the equalize step. The sets represent the direction.
inplace (bool) – Whether to debias the object inplace or return a new one

Warning

After calling debias, all the vectors of the word embedding will be normalized to unit length.

learn_full_specific_words(seed_specific_words='bolukbasi', max_non_specific_examples=None, debug=None)[source]¶

Learn specific words given a list of seed specific wordsself.

Using Linear SVM.

Parameters

seed_specific_words (list) – List of seed specific words
max_non_specific_examples (int) – The number of non-specific words to sample for training

Returns

List of learned specific words and the classifier object

compute_factual_association(factual_properity={'accountant': 61, 'analyst': 41, 'assistant': 85, 'attendant': 76, 'auditor': 61, 'baker': 65, 'carpenter': 2, 'cashier': 73, 'ceo': 39, 'chief': 27, 'cleaner': 89, 'clerk': 72, 'construction_worker': 4, 'cook': 38, 'counselors': 73, 'designers': 54, 'developer': 20, 'driver': 6, 'editor': 52, 'farmer': 22, 'guard': 22, 'hairdressers': 92, 'housekeeper': 89, 'janitor': 34, 'laborer': 4, 'lawyer': 35, 'librarian': 84, 'manager': 43, 'mechanician': 4, 'mover': 18, 'nurse': 90, 'physician': 38, 'receptionist': 90, 'salesperson': 48, 'secretary': 95, 'sewer': 80, 'sheriff': 14, 'supervisor': 44, 'teacher': 78, 'writer': 63})[source]¶

Compute association of a factual property to the projection.

Inspired by WEFAT (Word-Embedding Factual Association Test), but it is not the same: - Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.

In a future version, the WEFAT will also be implemented.

If a word doesn’t exist in the word embedding, then it will be filtered out.

For example, in responsibly.we.bias.GenderBiasWE, the defuat factual property is the percentage of female in various occupations from the Labor Force Statistics of 2017 Population Survey, Taken from: https://arxiv.org/abs/1804.06876

Parameters: factual_properity (dict) – Dictionary of words and their factual values.
Returns: Pearson r, pvalue and the words with their associated factual values and their projection on the bias direction.

plot_factual_association(factual_properity={'accountant': 61, 'analyst': 41, 'assistant': 85, 'attendant': 76, 'auditor': 61, 'baker': 65, 'carpenter': 2, 'cashier': 73, 'ceo': 39, 'chief': 27, 'cleaner': 89, 'clerk': 72, 'construction_worker': 4, 'cook': 38, 'counselors': 73, 'designers': 54, 'developer': 20, 'driver': 6, 'editor': 52, 'farmer': 22, 'guard': 22, 'hairdressers': 92, 'housekeeper': 89, 'janitor': 34, 'laborer': 4, 'lawyer': 35, 'librarian': 84, 'manager': 43, 'mechanician': 4, 'mover': 18, 'nurse': 90, 'physician': 38, 'receptionist': 90, 'salesperson': 48, 'secretary': 95, 'sewer': 80, 'sheriff': 14, 'supervisor': 44, 'teacher': 78, 'writer': 63}, ax=None)[source]¶

Plot association of a factual property to the projection.

See: BiasWordEmbedding.compute_factual_association()

Parameters: factual_properity (dict) – Dictionary of words and their factual values.

class responsibly.we.bias.BiasWordEmbedding(model, only_lower=False, verbose=False, identify_direction=False, to_normalize=True)[source]¶

Measure and adjust a bias in English word embedding.

Parameters

model – Word embedding model of gensim.model.KeyedVectors
only_lower (bool) – Whether the word embedding contrains only lower case words
verbose (bool) – Set verbosity
to_normalize (bool) – Whether to normalize all the vectors (recommended!)

project_on_direction(word)[source]¶

Project the normalized vector of the word on the direction.

Parameters: word (str) – The word tor project
Return float: The projection scalar

calc_projection_data(words)[source]¶

Calculate projection, projected and rejected vectors of a words list.

Parameters: words (list) – List of words
Returns: pandas.DataFrame of the projection, projected and rejected vectors of the words list

plot_projection_scores(words, n_extreme=10, ax=None, axis_projection_step=None)[source]¶

Plot the projection scalar of words on the direction.

Parameters

words (list) – The words tor project
or None n_extreme (int) – The number of extreme words to show

Returns

The ax object of the plot

plot_dist_projections_on_direction(word_groups, ax=None)[source]¶

Plot the projection scalars distribution on the direction.

Parameters: word_groups word (dict) – The groups to projects
Return float: The ax object of the plot

classmethod plot_bias_across_word_embeddings(word_embedding_bias_dict, words, ax=None, scatter_kwargs=None)[source]¶

Plot the projections of same words of two word mbeddings.

Parameters

word_embedding_bias_dict (dict) – WordsEmbeddingBias objects as values, and their names as keys.
words (list) – Words to be projected.
scatter_kwargs (dict or None) – Kwargs for matplotlib.pylab.scatter.

Returns

The ax object of the plot

generate_analogies(n_analogies=100, seed='ends', multiple=False, delta=1.0, restrict_vocab=30000, unrestricted=False)[source]¶

Generate analogies based on a seed vector.

x - y ~ seed vector. or a:x::b:y when a-b ~ seed vector.

The seed vector can be defined by two word ends, or by the bias direction.

delta is used for semantically coherent. Default vale of 1 corresponds to an angle <= pi/3.

There is criticism regarding generating analogies when used with unstricted=False and not ignoring analogies with match column equal to False. Tolga’s technique of generating analogies, as implemented in this method, is limited inherently to analogies with x != y, which may be force “fake” bias analogies.

See:

Nissim, M., van Noord, R., van der Goot, R. (2019). Fair is Better than Sensational: Man is to Doctor as Woman is to Doctor.

Parameters

seed – The definition of the seed vector. Either by a tuple of two word ends, or by ‘ends for the pre-defined ends or by ‘direction’ for the pre-defined direction vector.
n_analogies (int) – Number of analogies to generate.
multiple (bool) – Whether to allow multiple appearances of a word in the analogies.
delta (float) – Threshold for semantic similarity. The maximal distance between x and y.
restrict_vocab (int) – The vocabulary size to use.
unrestricted (bool) – Whether to validate the generated analogies with unrestricted most_similar.

Returns

Data Frame of analogies (x, y), their distances, and their cosine similarity scores

calc_direct_bias(neutral_words, c=None)[source]¶

Calculate the direct bias.

Based on the projection of neutral words on the direction.

Parameters

neutral_words (list) – List of neutral words
c (float or None) – Strictness of bias measuring

Returns

The direct bias

calc_indirect_bias(word1, word2)[source]¶

Calculate the indirect bias between two words.

Based on the amount of shared projection of the words on the direction.

Also called PairBias. :param str word1: First word :param str word2: Second word :type c: float or None :return The indirect bias between the two words

generate_closest_words_indirect_bias(neutral_positive_end, neutral_negative_end, words=None, n_extreme=5)[source]¶

Generate closest words to a neutral direction and their indirect bias.

The direction of the neutral words is used to find the most extreme words. The indirect bias is calculated between the most extreme words and the closest end.

Parameters

neutral_positive_end (str) – A word that define the positive side of the neutral direction.
neutral_negative_end (str) – A word that define the negative side of the neutral direction.
words (list) – List of words to project on the neutral direction.
n_extreme (int) – The number for the most extreme words (positive and negative) to show.

Returns

Data Frame of the most extreme words with their projection scores and indirect biases.

debias(method='hard', neutral_words=None, equality_sets=None, inplace=True)[source]¶

Debias the word embedding.

Parameters

method (str) – The method of debiasing.
neutral_words (list) – List of neutral words for the neutralize step
equality_sets (list) – List of equality sets, for the equalize step. The sets represent the direction.
inplace (bool) – Whether to debias the object inplace or return a new one

Warning

After calling debias, all the vectors of the word embedding will be normalized to unit length.

evaluate_word_embedding(kwargs_word_pairs=None, kwargs_word_analogies=None)[source]¶

Evaluate word pairs tasks and word analogies tasks.

Parameters

model – Word embedding.
kwargs_word_pairs (dict or None) – Kwargs for evaluate_word_pairs method.
kwargs_word_analogies – Kwargs for evaluate_word_analogies method.

Returns

Tuple of pandas.DataFrame for the evaluation results.

learn_full_specific_words(seed_specific_words, max_non_specific_examples=None, debug=None)[source]¶

Learn specific words given a list of seed specific wordsself.

Using Linear SVM.

Parameters

seed_specific_words (list) – List of seed specific words
max_non_specific_examples (int) – The number of non-specific words to sample for training

Returns

List of learned specific words and the classifier object

compute_factual_association(factual_properity)[source]¶

Compute association of a factual property to the projection.

Inspired by WEFAT (Word-Embedding Factual Association Test), but it is not the same: - Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.

In a future version, the WEFAT will also be implemented.

If a word doesn’t exist in the word embedding, then it will be filtered out.

For example, in responsibly.we.bias.GenderBiasWE, the defuat factual property is the percentage of female in various occupations from the Labor Force Statistics of 2017 Population Survey, Taken from: https://arxiv.org/abs/1804.06876

Parameters: factual_properity (dict) – Dictionary of words and their factual values.
Returns: Pearson r, pvalue and the words with their associated factual values and their projection on the bias direction.

plot_factual_association(factual_properity, ax=None)[source]¶

Plot association of a factual property to the projection.

See: BiasWordEmbedding.compute_factual_association()

Parameters: factual_properity (dict) – Dictionary of words and their factual values.

static plot_most_biased_clustering(biased, debiased, seed='ends', n_extreme=500, random_state=1)[source]¶

Plot clustering as classification of biased neutral words.

Parameters

biased – Biased word embedding of BiasWordEmbedding.
debiased – Debiased word embedding of BiasWordEmbedding.
seed – The definition of the seed vector. Either by a tuple of two word ends, or by ‘ends for the pre-defined ends or by ‘direction’ for the pre-defined direction vector.
n_extrem – The number of extreme biased neutral words to use.

Returns

Tuple of list of ax objects of the plot, and a dictionary with the most positive and negative words.

Based on:

Gonen, H., & Goldberg, Y. (2019). Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. arXiv preprint arXiv:1903.03862.
https://github.com/gonenhila/gender_bias_lipstick

WEAT¶

Compute WEAT score of a Word Embedding.

WEAT is a bias measurement method for word embedding, which is inspired by the IAT (Implicit Association Test) for humans. It measures the similarity between two sets of target words (e.g., programmer, engineer, scientist, … and nurse, teacher, librarian, …) and two sets of attribute words (e.g., man, male, … and woman, female …). A p-value is calculated using a permutation-test.

Reference:

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.

Important

The effect size and pvalue in the WEAT have entirely different meaning from those reported in IATs (original finding). Refer to the paper for more details.

Stimulus and original finding from:

[0, 1, 2] A. G. Greenwald, D. E. McGhee, J. L. Schwartz, Measuring individual differences in implicit cognition: the implicit association test., Journal of personality and social psychology 74, 1464 (1998).
[3, 4]: M. Bertrand, S. Mullainathan, Are Emily and Greg more employable than Lakisha and Jamal? a field experiment on labor market discrimination, The American Economic Review 94, 991 (2004).
[5, 6, 9]: B. A. Nosek, M. Banaji, A. G. Greenwald, Harvesting implicit group attitudes and beliefs from a demonstration web site., Group Dynamics: Theory, Research, and Practice 6, 101 (2002).
[7]: B. A. Nosek, M. R. Banaji, A. G. Greenwald, Math=male, me=female, therefore math≠me., Journal of Personality and Social Psychology 83, 44 (2002).
[8] P. D. Turney, P. Pantel, From frequency to meaning: Vector space models of semantics, Journal of Artificial Intelligence Research 37, 141 (2010).

responsibly.we.weat.calc_single_weat(model, first_target, second_target, first_attribute, second_attribute, with_pvalue=True, pvalue_kwargs=None)[source]¶

Calc the WEAT result of a word embedding.

Parameters

model – Word embedding model of gensim.model.KeyedVectors
first_target (dict) – First target words list and its name
second_target (dict) – Second target words list and its name
first_attribute (dict) – First attribute words list and its name
second_attribute (dict) – Second attribute words list and its name
with_pvalue (bool) – Whether to calculate the p-value of the WEAT score (might be computationally expensive)

Returns

WEAT result (score, size effect, Nt, Na and p-value)

responsibly.we.weat.calc_weat_pleasant_unpleasant_attribute(model, first_target, second_target, with_pvalue=True, pvalue_kwargs=None)[source]¶

Calc the WEAT result with pleasent vs. unpleasant attributes.

Parameters

model – Word embedding model of gensim.model.KeyedVectors
first_target (dict) – First target words list and its name
second_target (dict) – Second target words list and its name
with_pvalue (bool) – Whether to calculate the p-value of the WEAT score (might be computationally expensive)

Returns

WEAT result (score, size effect, Nt, Na and p-value)

responsibly.we.weat.calc_all_weat(model, weat_data='caliskan', filter_by='model', with_original_finding=False, with_pvalue=True, pvalue_kwargs=None)[source]¶

Calc the WEAT results of a word embedding on multiple cases.

Note that for the effect size and pvalue in the WEAT have entirely different meaning from those reported in IATs (original finding). Refer to the paper for more details.

Parameters

model – Word embedding model of gensim.model.KeyedVectors
weat_data (dict) –
WEAT cases data. - If ‘caliskan’ (default) then all

the experiments from the original will be used.
- If an interger, then the specific experiment by index from the original paper will be used.
- If a interger, then tje specific experiments by indices from the original paper will be used.
filter_by (bool) – Whether to filter the word lists by the model (‘model’) or by the remove key in weat_data (‘data’).
with_original_finding (bool) – Show the origina
with_pvalue (bool) – Whether to calculate the p-value of the WEAT results (might be computationally expensive)

Returns

pandas.DataFrame of WEAT results (score, size effect, Nt, Na and p-value)

Utilities¶

responsibly.we.utils.normalize(v)[source]¶: Normalize a 1-D vector.

responsibly.we.utils.cosine_similarity(v, u)[source]¶: Calculate the cosine similarity between two vectors.

responsibly.we.utils.project_vector(v, u)[source]¶: Projecting the vector v onto direction u.

responsibly.we.utils.reject_vector(v, u)[source]¶: Rejecting the vector v onto direction u.

responsibly.we.utils.project_reject_vector(v, u)[source]¶: Projecting and rejecting the vector v onto direction u.

responsibly.we.utils.project_params(u, v)[source]¶: Projecting and rejecting the vector v onto direction u with scalar.

responsibly.we.utils.cosine_similarities_by_words(model, word, words)[source]¶: Compute cosine similarities between a word and a set of other words.

responsibly.we.utils.most_similar(model, positive=None, negative=None, topn=10, restrict_vocab=None, indexer=None, unrestricted=True)[source]¶

Find the top-N most similar words.

Positive words contribute positively towards the similarity, negative words negatively.

This function computes cosine similarity between a simple mean of the projection weight vectors of the given words and the vectors for each word in the model. The function corresponds to the word-analogy and distance scripts in the original word2vec implementation.

Based on Gensim implementation.

Parameters

model – Word embedding model of gensim.model.KeyedVectors.
positive (list) – List of words that contribute positively.
negative (list) – List of words that contribute negatively.
topn (int) – Number of top-N similar words to return.
restrict_vocab (int) – Optional integer which limits the range of vectors which are searched for most-similar values. For example, restrict_vocab=10000 would only check the first 10000 word vectors in the vocabulary order. (This may be meaningful if you’ve sorted the vocabulary by descending frequency.)
unrestricted (bool) – Whether to restricted the most similar words to be not from the positive or negative word list.

Returns

Sequence of (word, similarity).

Word Embedding Benchmarks¶

Evaluate word embedding by standard benchmarks.

Reference:

https://github.com/kudkudak/word-embeddings-benchmarks

Word Pairs Tasks¶

The WordSimilarity-353 Test Collection http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/
Rubenstein, H., and Goodenough, J. 1965. Contextual correlates of synonymy https://www.seas.upenn.edu/~hansens/conceptSim/
Stanford Rare Word (RW) Similarity Dataset https://nlp.stanford.edu/~lmthang/morphoNLM/
The Word Relatedness Mturk-771 Test Collection http://www2.mta.ac.il/~gideon/datasets/mturk_771.html
The MEN Test Collection http://clic.cimec.unitn.it/~elia.bruni/MEN.html
SimLex-999 https://fh295.github.io/simlex.html
TR9856 https://www.research.ibm.com/haifa/dept/vst/files/IBM_Debater_(R)_TR9856.v2.zip

Analogies Tasks¶

Google Analogies (subset of WordRep) https://code.google.com/archive/p/word2vec/source
MSR - Syntactic Analogies http://research.microsoft.com/en-us/projects/rnn/

responsibly.we.benchmark.evaluate_word_pairs(model, kwargs_word_pairs=None)[source]¶

Evaluate word pairs tasks.

Parameters

model – Word embedding.
kwargs_word_pairs (dict or None) – Kwargs for evaluate_word_pairs method.

Returns

pandas.DataFrame of evaluation results.

responsibly.we.benchmark.evaluate_word_analogies(model, kwargs_word_analogies=None)[source]¶

Evaluate word analogies tasks.

Parameters

model – Word embedding.
kwargs_word_analogies – Kwargs for evaluate_word_analogies method.

Returns

pandas.DataFrame of evaluation results.

responsibly.we.benchmark.evaluate_word_embedding(model, kwargs_word_pairs=None, kwargs_word_analogies=None)[source]¶

Evaluate word pairs tasks and word analogies tasks.

Parameters

model – Word embedding.
kwargs_word_pairs (dict or None) – Kwargs fo evaluate_word_pairs method.
kwargs_word_analogies – Kwargs for evaluate_word_analogies method.

Returns

Tuple of DataFrame for the evaluation results.

Word Embedding Bias¶

Bolukbasi Bias Measure and Debiasing¶

Usage¶

Types of Bias¶

Direct Bias¶

Indirect Bias¶

WEAT¶

Utilities¶

Word Embedding Benchmarks¶

Word Pairs Tasks¶

Analogies Tasks¶

Responsibly

Navigation

Related Topics