Demo - Fairness Analysis of COMPAS by ProPublica ================================================ Based on: https://github.com/propublica/compas-analysis What follows are the calculations performed for ProPublica’s analaysis of the COMPAS Recidivism Risk Scores. It might be helpful to open `the methodology `__ in another tab to understand the following. .. code:: import numpy as np import pandas as pd from scipy import stats import matplotlib.pylab as plt import seaborn as sns from responsibly.dataset import COMPASDataset from responsibly.fairness.metrics import distplot_by Loading the Data ---------------- We select fields for severity of charge, number of priors, demographics, age, sex, compas scores, and whether each person was accused of a crime within two years. There are a number of reasons remove rows because of missing data: - If the charge date of a defendants Compas scored crime was not within 30 days from when the person was arrested, we assume that because of data quality reasons, that we do not have the right offense. - We coded the recidivist flag – ``is_recid`` – to be -1 if we could not find a compas case at all. - In a similar vein, ordinary traffic offenses – those with a ``c_charge_degree`` of ‘O’ – will not result in Jail time are removed (only two of them). - We filtered the underlying data from Broward county to include only those rows representing people who had either recidivated in two years, or had at least two years outside of a correctional facility. All of this is already done by instantiating a ``COMPASDataset`` object from ``responsibly``. .. code:: compas_ds = COMPASDataset() df = compas_ds.df len(df) .. parsed-literal:: 6172 EDA --- Higher COMPAS scores are slightly correlated with a longer length of stay. .. code:: stats.pearsonr(df['length_of_stay'].astype(int), df['decile_score']) .. parsed-literal:: (0.20741201943031584, 5.943991686971499e-61) After filtering we have the following demographic breakdown: .. code:: df['age_cat'].value_counts() .. parsed-literal:: 25 - 45 3532 Less than 25 1347 Greater than 45 1293 Name: age_cat, dtype: int64 .. code:: df['race'].value_counts() .. parsed-literal:: African-American 3175 Caucasian 2103 Hispanic 509 Other 343 Asian 31 Native American 11 Name: race, dtype: int64 .. code:: (((df['race'].value_counts() / len(df)) * 100) .round(2)) .. parsed-literal:: African-American 51.44 Caucasian 34.07 Hispanic 8.25 Other 5.56 Asian 0.50 Native American 0.18 Name: race, dtype: float64 .. code:: df['score_text'].value_counts() .. parsed-literal:: Low 3421 Medium 1607 High 1144 Name: score_text, dtype: int64 .. code:: pd.crosstab(df['sex'], df['race']) .. raw:: html

race	African-American	Asian	Caucasian	Hispanic	Native American	Other
sex
Female	549	2	482	82	2	58
Male	2626	29	1621	427	9	285

.. code:: (((df['sex'].value_counts() / len(df)) * 100) .round(2)) .. parsed-literal:: Male 80.96 Female 19.04 Name: sex, dtype: float64 .. code:: df['two_year_recid'].value_counts() .. parsed-literal:: 0 3363 1 2809 Name: two_year_recid, dtype: int64 .. code:: (((df['two_year_recid'].value_counts() / len(df)) * 100) .round(2)) .. parsed-literal:: 0 54.49 1 45.51 Name: two_year_recid, dtype: float64 Judges are often presented with two sets of scores from the Compas system – one that classifies people into High, Medium and Low risk, and a corresponding decile score. There is a clear downward trend in the decile scores as those scores increase for white defendants. .. code:: RACE_IN_FOCUS = ['African-American', 'Caucasian'] df_race_focused = df[df['race'].isin(RACE_IN_FOCUS)] .. code:: g = sns.FacetGrid(df_race_focused, col='race', height=7)#, aspect=4,) g.map(plt.hist, 'decile_score', rwidth=0.9); .. image:: demo-compas-analysis_files/demo-compas-analysis_18_0.png .. code:: distplot_by(df['decile_score'], df['race'], hist=False); .. image:: demo-compas-analysis_files/demo-compas-analysis_19_0.png .. code:: pd.crosstab(df['decile_score'], df['race']) .. raw:: html

race	African-American	Asian	Caucasian	Hispanic	Native American	Other
decile_score
1	365	15	605	159	0	142
2	346	4	321	89	2	60
3	298	5	238	73	1	32
4	337	0	243	47	0	39
5	323	1	200	39	0	19
6	318	2	160	27	2	20
7	343	1	113	28	2	9
8	301	2	96	14	0	7
9	317	0	77	17	2	7
10	227	1	50	16	2	8

.. code:: pd.crosstab(df['two_year_recid'], df['race'], normalize='index') .. raw:: html

race	African-American	Asian	Caucasian	Hispanic	Native American	Other
two_year_recid
0	0.450193	0.006839	0.380910	0.095153	0.001784	0.065120
1	0.591314	0.002848	0.292631	0.067284	0.001780	0.044144

.. code:: pd.crosstab(df_race_focused['two_year_recid'], df_race_focused['race'], normalize='index') .. raw:: html

race	African-American	Caucasian
two_year_recid
0	0.541682	0.458318
1	0.668949	0.331051

Fairness Demographic Classification Criteria -------------------------------------------- Based on: https://fairmlbook.org/demographic.html .. code:: from responsibly.fairness.metrics import (independence_binary, separation_binary, sufficiency_binary, independence_score, separation_score, sufficiency_score, report_binary, plot_roc_by_attr) Independence ~~~~~~~~~~~~ .. code:: indp, indp_cmp = independence_binary((df_race_focused['decile_score'] > 4), df_race_focused['race'], 'Caucasian', as_df=True) .. code:: indp, indp_cmp = independence_binary((df_race_focused['decile_score'] > 4), df_race_focused['race'], 'Caucasian', as_df=True) .. code:: indp.plot(kind='bar'); .. image:: demo-compas-analysis_files/demo-compas-analysis_28_0.png .. code:: indp_cmp .. raw:: html

	acceptance_rate
African-American vs. Caucasian
diff	0.245107
ratio	1.740604

.. code:: independence_score(df_race_focused['decile_score'], df_race_focused['race'], as_df=True).plot(); .. image:: demo-compas-analysis_files/demo-compas-analysis_30_0.png Separation ~~~~~~~~~~ .. code:: sep, sep_cmp = separation_binary(df_race_focused['two_year_recid'], (df_race_focused['decile_score'] > 4), df_race_focused['race'], 'Caucasian', as_df=True) .. code:: sep.plot(kind='bar'); .. image:: demo-compas-analysis_files/demo-compas-analysis_33_0.png .. code:: sep_cmp .. raw:: html

	fnr	fpr	tnr	tpr
African-American vs. Caucasian
diff	-0.211582	0.203241	-0.203241	0.211582
ratio	0.573724	1.923234	0.739387	1.420098

.. code:: plot_roc_by_attr(df_race_focused['two_year_recid'], df_race_focused['decile_score'], df_race_focused['race'], figsize=(7, 7)); .. image:: demo-compas-analysis_files/demo-compas-analysis_35_0.png Sufficiency ~~~~~~~~~~~ .. code:: suff, suff_cmp = sufficiency_binary(df_race_focused['two_year_recid'], (df_race_focused['decile_score'] > 4), df_race_focused['race'], 'Caucasian', as_df=True) .. code:: suff.plot(kind='bar'); .. image:: demo-compas-analysis_files/demo-compas-analysis_38_0.png .. code:: suff_cmp .. raw:: html

	npv	ppv
African-American vs. Caucasian
diff	-0.061433	0.054708
ratio	0.913477	1.091972

.. code:: sufficiency_score(df_race_focused['two_year_recid'], df_race_focused['decile_score'], df_race_focused['race'], as_df=True).plot(); .. image:: demo-compas-analysis_files/demo-compas-analysis_40_0.png Transforming the score to percentiles by group ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: sufficiency_score(df_race_focused['two_year_recid'], df_race_focused['decile_score'], df_race_focused['race'], within_score_percentile=True, as_df=True).plot(); .. image:: demo-compas-analysis_files/demo-compas-analysis_42_0.png Generating all the relevant statistics for a binary prediction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: report_binary(df_race_focused['two_year_recid'], df_race_focused['decile_score'] > 4, df_race_focused['race']) .. raw:: html

	African-American	Caucasian
total	3175.000000	2103.000000
proportion	0.601554	0.398446
base_rate	0.523150	0.390870
acceptance_rate	0.576063	0.330956
accuracy	0.649134	0.671897
fnr	0.284768	0.496350
fpr	0.423382	0.220141
ppv	0.649535	0.594828
npv	0.648588	0.710021

Threshold Intervention ---------------------- .. code:: from responsibly.fairness.metrics import roc_curve_by_attr from responsibly.fairness.interventions.threshold import (find_thresholds_by_attr, plot_fpt_tpr, plot_roc_curves_thresholds, plot_costs, plot_thresholds) .. code:: rocs = roc_curve_by_attr(df_race_focused['two_year_recid'], df_race_focused['decile_score'], df_race_focused['race']) Thresholds vs. FPR and TPR ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: plot_fpt_tpr(rocs); .. image:: demo-compas-analysis_files/demo-compas-analysis_49_0.png Comparison of Different Criteria ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Single threshold (Group Unaware) - Minimum Cost - Independence (Demographic Parity) - FNR (Equality of opportunity) - Separation (Equalized odds) Cost: :math:`FP = FN = -1` ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: COST_MATRIX = [[0, -1], [-1, 0]] .. code:: thresholds_data = find_thresholds_by_attr(df_race_focused['two_year_recid'], df_race_focused['decile_score'], df_race_focused['race'], COST_MATRIX) .. code:: plot_roc_curves_thresholds(rocs, thresholds_data); .. image:: demo-compas-analysis_files/demo-compas-analysis_53_0.png Thresholds by Strategy and Group ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: plot_thresholds(thresholds_data, xlim=(0, 10)); .. image:: demo-compas-analysis_files/demo-compas-analysis_55_0.png Cost by Threshold Strategy ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: plot_costs(thresholds_data); .. image:: demo-compas-analysis_files/demo-compas-analysis_57_0.png