Demo - Fairness Analysis of FICO
================================

Adapted version of:

1. Fairness and machine learning book - `Chapter
   2 <https://fairmlbook.org/demographic.html>`__ and `code
   repository <https://github.com/fairmlbook/fairmlbook.github.io>`__.

2. Hardt, M., Price, E., & Srebro, N. (2016). `Equality of opportunity
   in supervised learning <https://arxiv.org/abs/1610.02413>`__. In
   Advances in neural information processing systems (pp. 3315-3323).

3. `Attacking discrimination with smarter machine
   learning <https://research.google.com/bigpicture/attacking-discrimination-in-ml/>`__
   by Google

| From
  `Wikipeida <https://en.wikipedia.org/wiki/Credit_score_in_the_United_States>`__:
  > Credit score in the United States is a number representing the
  creditworthiness of a person, the likelihood that person will pay his
  or her debts.
| Lenders, such as banks and credit card companies, use credit scores to
  evaluate the potential risk posed by lending money to consumers.
  Lenders allege that widespread use of credit scores has made credit
  more widely available and less expensive for many consumers

The analysis is based on data from `Report to the Congress on Credit
Scoring and Its Effects on the Availability and Affordability of
Credit <https://federalreserve.gov/boarddocs/rptcongress/creditscore/>`__
by the Federal Reserve. The data set provides aggregate statistics from
2003 about a credit score, demographic information (race or ethnicity,
gender, marital status), and outcomes (to be defined shortly).

In the USA there are three majour creding agencies, which are for-profit
organizations. They offer risk score based on the data they collected.
Wre are going to look into **FICO** score of TransUnion (called
TransRisk). The TransRisk score is in turn based on a proprietary model
created by FICO, hence often referred to as FICO scores.

|Factors contributing to someone’s credit score| Source: Wikipedia

From Fairness and Machine Learning - Limitations and Opportunities: >
Regulation of credit agencies in the United States started with the Fair
Credit Reporting Act, first passed in 1970, that aims to promote the
accuracy, fairness, and privacy of consumer of information collected by
the reporting agencies. The Equal Credit Opportunity Act, a United
States law enacted in 1974, makes it unlawful for any creditor to
discriminate against any applicant the basis of race, color, religion,
national origin, sex, marital status, or age.

In our analysis we’ll use on the joint statistics of score, race, and outcome.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. |Factors contributing to someone’s credit score| image:: https://upload.wikimedia.org/wikipedia/commons/thumb/7/74/Credit-score-chart.svg/640px-Credit-score-chart.svg.png

.. code:: ipython3

    import pandas as pd
    import matplotlib.pylab as plt
    
    from responsibly.fairness.metrics import plot_roc_curves
    from responsibly.fairness.interventions.threshold import (find_thresholds,
                                                              plot_fpt_tpr,
                                                              plot_roc_curves_thresholds,
                                                              plot_costs,
                                                              plot_thresholds)

FICO Dataset
~~~~~~~~~~~~

FICO dataset can be loaded directly from ``responsibly``. The dataset,
in this case, is *aggregate*, i.e., there is no outcome and prediction
information per individual, but summarized statistics for each FICO
score and race/race/ethnicity group.

.. code:: ipython3

    from responsibly.dataset import build_FICO_dataset
    
    FICO = build_FICO_dataset()

``FICO`` is a dictionary that holds variaty of data:

.. code:: ipython3

    FICO.keys()


.. parsed-literal::

    dict_keys(['rocs', 'proportions', 'total', 'base_rate', 'tpr', 'performance', 'pdf', 'aucs', 'totals', 'cdf', 'base_rates', 'fpr'])


.. code:: ipython3

    help(build_FICO_dataset)


.. parsed-literal::

    Help on function build_FICO_dataset in module responsibly.dataset.fico:
    
    build_FICO_dataset()
        Build the FICO dataset.
        
        Dataset of the credit score of TransUnion (called TransRisk).
        The TransRisk score is in turn based on
        a proprietary model created by FICO,
        hence often referred to as FICO scores.
        
        The data is *aggregated*, i.e., there is no outcome
        and prediction information per individual,
        but summarized statistics for each FICO score
        and race/race/ethnicity group.
        
        +---------------+------------------------------------------------------+
        | FICO key      | Meaning                                              |
        +===============+======================================================+
        | `total`       | Total number of individuals                          |
        +---------------+------------------------------------------------------+
        | `totals`      | Number of individuals per group                      |
        +---------------+------------------------------------------------------+
        | `cdf`         | Cumulative distribution function of score per group  |
        +---------------+------------------------------------------------------+
        | `pdf`         | Probability distribution function of score per group |
        +---------------+------------------------------------------------------+
        | `performance` | Fraction of non-defaulters per score and group       |
        +---------------+------------------------------------------------------+
        | `base_rates`  | Base rate of non-defaulters per group                |
        +---------------+------------------------------------------------------+
        | `base_rate`   | The overall base rate non-defaulters                 |
        +---------------+------------------------------------------------------+
        | `proportions` | Fraction of individuals per group                    |
        +---------------+------------------------------------------------------+
        | `fpr`         | True Positive Rate by score as threshold per group   |
        +---------------+------------------------------------------------------+
        | `tpr`         | False Positive Rate by score as threshold per group  |
        +---------------+------------------------------------------------------+
        | `rocs`        | ROC per group                                        |
        +---------------+------------------------------------------------------+
        | `aucs`        | ROC AUC per group                                    |
        +---------------+------------------------------------------------------+
        
        :return: Dictionary of various aggregated statics
                 of the FICO credit score.
        :rtype: dict
        
        References:
            - Based on code (MIT License) by Moritz Hardt
              from https://github.com/fairmlbook/fairmlbook.github.io
            - https://fairmlbook.org/demographic.html#case-study-credit-scoring
    

Counts by Race or Ethnicity
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3

    sum(FICO['totals'].values())


.. parsed-literal::

    174047


.. code:: ipython3

    pd.Series(FICO['totals']).plot(kind='barh');


.. image:: demo-fico-analysis_files/demo-fico-analysis_10_0.png


Score Distribution
~~~~~~~~~~~~~~~~~~

The score used in the study is based on the TransUnion TransRisk score.
TransUnion is a US credit-reporting agency. The TransRisk score is in
turn based on a proprietary model created by FICO, hence often referred
to as FICO scores. The Federal Reserve renormalized the scores for the
study to vary from 0 to 100, with 0 being least creditworthy.

The information on race was provided by the Social Security
Administration, thus relying on self-reported values.

The cumulative distribution of these credit scores strongly depends on
the group as the next figure reveals.

.. code:: ipython3

    FICO['cdf'].head()


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Asian</th>
          <th>Black</th>
          <th>Hispanic</th>
          <th>White</th>
        </tr>
        <tr>
          <th>Score</th>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0.0</th>
          <td>0.0000</td>
          <td>0.0007</td>
          <td>0.0001</td>
          <td>0.0001</td>
        </tr>
        <tr>
          <th>0.5</th>
          <td>0.0013</td>
          <td>0.0119</td>
          <td>0.0047</td>
          <td>0.0026</td>
        </tr>
        <tr>
          <th>1.0</th>
          <td>0.0088</td>
          <td>0.0533</td>
          <td>0.0222</td>
          <td>0.0116</td>
        </tr>
        <tr>
          <th>1.5</th>
          <td>0.0107</td>
          <td>0.0647</td>
          <td>0.0274</td>
          <td>0.0143</td>
        </tr>
        <tr>
          <th>2.0</th>
          <td>0.0132</td>
          <td>0.0789</td>
          <td>0.0349</td>
          <td>0.0180</td>
        </tr>
      </tbody>
    </table>
    </div>


.. code:: ipython3

    FICO['cdf'].tail()


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Asian</th>
          <th>Black</th>
          <th>Hispanic</th>
          <th>White</th>
        </tr>
        <tr>
          <th>Score</th>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>98.0</th>
          <td>0.9894</td>
          <td>0.9989</td>
          <td>0.9966</td>
          <td>0.9894</td>
        </tr>
        <tr>
          <th>98.5</th>
          <td>0.9961</td>
          <td>0.9995</td>
          <td>0.9988</td>
          <td>0.9962</td>
        </tr>
        <tr>
          <th>99.0</th>
          <td>0.9989</td>
          <td>0.9999</td>
          <td>0.9998</td>
          <td>0.9991</td>
        </tr>
        <tr>
          <th>99.5</th>
          <td>0.9994</td>
          <td>1.0000</td>
          <td>1.0000</td>
          <td>0.9998</td>
        </tr>
        <tr>
          <th>100.0</th>
          <td>1.0000</td>
          <td>1.0000</td>
          <td>1.0000</td>
          <td>1.0000</td>
        </tr>
      </tbody>
    </table>
    </div>


.. code:: ipython3

    f, ax = plt.subplots(1, figsize=(7, 5))
    
    FICO['cdf'].plot(ax=ax)
    
    plt.title('CDF by Group')
    plt.ylabel('Cumulative Probability');


.. image:: demo-fico-analysis_files/demo-fico-analysis_14_0.png


Outcome Variable
~~~~~~~~~~~~~~~~

**Performance variable** that measures a serious delinquency in at least
one credit line of a certain time period:

   “(the) measure is based on the performance of new or existing
   accounts and measures whether individuals have been late 90 days or
   more on one or more of their accounts or had a public record item or
   a new collection agency account during the performance period.” -
   *from the Federal Reserve report*

The ``FICO['performance']`` holds the percentage of non-defaulters for
every score value (rows) and race/ethnicity group (columns):

.. code:: ipython3

    FICO['performance'].head()


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Asian</th>
          <th>Black</th>
          <th>Hispanic</th>
          <th>White</th>
        </tr>
        <tr>
          <th>Score</th>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0.0</th>
          <td>0.0523</td>
          <td>0.0033</td>
          <td>0.0095</td>
          <td>0.0146</td>
        </tr>
        <tr>
          <th>0.5</th>
          <td>0.0552</td>
          <td>0.0077</td>
          <td>0.0151</td>
          <td>0.0205</td>
        </tr>
        <tr>
          <th>1.0</th>
          <td>0.0581</td>
          <td>0.0120</td>
          <td>0.0207</td>
          <td>0.0264</td>
        </tr>
        <tr>
          <th>1.5</th>
          <td>0.0610</td>
          <td>0.0164</td>
          <td>0.0262</td>
          <td>0.0323</td>
        </tr>
        <tr>
          <th>2.0</th>
          <td>0.0639</td>
          <td>0.0207</td>
          <td>0.0318</td>
          <td>0.0382</td>
        </tr>
      </tbody>
    </table>
    </div>


.. code:: ipython3

    FICO['performance'].tail()


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Asian</th>
          <th>Black</th>
          <th>Hispanic</th>
          <th>White</th>
        </tr>
        <tr>
          <th>Score</th>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>98.0</th>
          <td>0.9916</td>
          <td>0.9818</td>
          <td>0.9891</td>
          <td>0.9899</td>
        </tr>
        <tr>
          <th>98.5</th>
          <td>0.9917</td>
          <td>0.9840</td>
          <td>0.9897</td>
          <td>0.9902</td>
        </tr>
        <tr>
          <th>99.0</th>
          <td>0.9918</td>
          <td>0.9861</td>
          <td>0.9902</td>
          <td>0.9905</td>
        </tr>
        <tr>
          <th>99.5</th>
          <td>0.9920</td>
          <td>0.9882</td>
          <td>0.9908</td>
          <td>0.9907</td>
        </tr>
        <tr>
          <th>100.0</th>
          <td>0.9921</td>
          <td>0.9904</td>
          <td>0.9913</td>
          <td>0.9910</td>
        </tr>
      </tbody>
    </table>
    </div>


Separation Fairness Criterion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

By the separation criterion of a binary classifier, the *FPR* and *TPR*
should be equal across the groups.

.. code:: ipython3

    plot_roc_curves(FICO['rocs'], FICO['aucs'],
                    figsize=(7, 5));


.. image:: demo-fico-analysis_files/demo-fico-analysis_19_0.png


The meaning of true positive rate is the rate of predicted positive
performance given positive performance. Similarly, false positive rate
is the rate of predicted negative performance given a positive
performance.

.. code:: ipython3

    plot_roc_curves(FICO['rocs'], FICO['aucs'],
                    figsize=(7, 5));
    
    plt.xlim(0, 0.3)
    plt.ylim(0.4, 1);


.. image:: demo-fico-analysis_files/demo-fico-analysis_21_0.png


Thresholds vs. FPR and TPR
~~~~~~~~~~~~~~~~~~~~~~~~~~

The ROC is paramaritazied over the thershold, so the same threshold
might be related to different (FPR, TPR) pairs for each group. We can
observe it by plotting the FPR and the TPR as a function of the
threshold by the groups.

.. code:: ipython3

    plot_fpt_tpr(FICO['rocs'], figsize=(15, 7),
                 title_fontsize=15, text_fontsize=15);


.. image:: demo-fico-analysis_files/demo-fico-analysis_23_0.png


Therefore, a naive choice of a single threshold will cause to a
violation of the separation fairness criterion, as there will be
different in FPR and TPR between the groups.

Comparison of Different Criteria
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-  Single threshold (Group Unaware)
-  Minimum Cost
-  Independence (Demographic Parity)
-  FNR (Equality of opportunity)
-  Separation (Equalized odds)

Cost: :math:`FP = - 5 \cdot TP`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code:: ipython3

    COST_MATRIX = [[0, -5/6],
                   [0,  1/6]]

.. code:: ipython3

    thresholds_data = find_thresholds(FICO['rocs'],
                                      FICO['proportions'],
                                      FICO['base_rate'],
                                      FICO['base_rates'],
                                      COST_MATRIX)

.. code:: ipython3

    plot_roc_curves_thresholds(FICO['rocs'], thresholds_data,
                               figsize=(7, 7),
                               title_fontsize=20, text_fontsize=15);


.. image:: demo-fico-analysis_files/demo-fico-analysis_28_0.png


.. code:: ipython3

    plot_roc_curves_thresholds(FICO['rocs'], thresholds_data,
                               figsize=(7, 7),
                               title_fontsize=20, text_fontsize=15)
    
    plt.xlim(0, 0.3)
    plt.ylim(0.4, 1);


.. image:: demo-fico-analysis_files/demo-fico-analysis_29_0.png


Thresholds by Strategy and Group
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3

    plot_thresholds(thresholds_data,
                    xlim=(0, 100), figsize=(7, 7),
                    title_fontsize=20, text_fontsize=15);


.. image:: demo-fico-analysis_files/demo-fico-analysis_31_0.png


Cost by Threshold Strategy
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3

    plot_costs(thresholds_data);


.. image:: demo-fico-analysis_files/demo-fico-analysis_33_0.png


Sufficiency Fairness Criterion - Calibration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3

    f, axes = plt.subplots(nrows=1, ncols=2, figsize=(20, 10))
    
    FICO['performance'].plot(ax=axes[0])
    axes[0].set_ylabel('Non-default rate')
    
    for group in FICO['cdf'].columns:
        axes[1].plot(FICO['cdf'][group], FICO['performance'][group],
                 label=group)
        
    axes[1].set_ylabel('Non-default rate')
    axes[1].set_xlabel('Score')
    axes[1].legend();


.. image:: demo-fico-analysis_files/demo-fico-analysis_35_0.png


Due to the differences in score distribution by group, it could
nonetheless be the case that thresholding the score leads to a
classifier with different positive predictive values in each group.