Bias and Fairness in Machine Learning: Identifying Bias

Submission information

Submission Number: 155

Submission ID: 1668

Submission UUID: c585f04a-d9f9-4972-91cd-06100d9f5b1d

Submission URI: /form/project

Created: Thu, 10/27/2022 - 00:59

Completed: Thu, 10/27/2022 - 00:59

Changed: Sat, 06/29/2024 - 10:46

Remote IP address: 157.160.82.162

Submitted by: Ahmed Rashed

Language: English

Is draft: No

Webform: Project

Project Title Bias and Fairness in Machine Learning: Identifying Bias

Program CAREERS

Project Image {Empty}

Tags machine-learning (272), natural-language-processing (274)

Status Complete

Project Leader

Project Leader Ahmed Rashed

Email amrashed@ship.edu

Mobile Phone 6627032781

Work Phone {Empty}

Project Personnel

Mentor(s) Pranav Venkit

Student-facilitator(s) Abdelkrim Kallich

Mentee(s) {Empty}

Project Information

Project Description With the widespread use of artificial intelligence (AI) systems and applications in our everyday lives, accounting for fairness has gained significant importance in designing and engineering of such systems. AI systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that these decisions do not reflect discriminatory behavior toward certain groups or populations. More recently some work has been developed in traditional machine learning and deep learning that address such challenges in different subdomains. With the commercialization of these systems, researchers are becoming more aware of the biases that these applications can contain and are attempting to address them.
In the industry, it has become very critical to create fair ML models in order to respect different groups in the sensitive features that are protected by the law and not to favorably select some groups against the others. Bias can show up in either dataset sampling or model performance against protected groups or individuals. Therefore, it is important in the industry to establish a bias analysis system to identify and mitigate the bias in both the dataset and model performance with respect to group and individual fairness.
There are several fairness libraries to achieve this job. In the industry, fairness libraries that are used in bias analysis must be created by well-known organizations. There are fairness libraries created by big companies such as Microsoft, IBM, and Google. The goal of this project is to compare the fairness libraries that can be used in the industry and work out a use-case using a published dataset.

Project Information Subsection

Project Deliverables 1. Surveying the basics of bias and fairness in machine learning. The students will learn the basics from the two review articles “A Survey on Bias and Fairness in Machine Learning” by NINAREH MEHRABI, FRED MORSTATTER, NRIPSUTA SAXENA, KRISTINA LERMAN, and ARAM GALSTYAN, and “An Introduction to Algorithmic Fairness” arXiv:2105.05595v1 [cs.CY] by Hilde J.P. Weerts.
2. Searching for possible fairness libraries that can be used in the industry. We will use three libraries created by big technology companies, so that they are trustable to be used in industry.
• Fairlearn (By Microsoft)
• AIF360 (By IBM)
• What-if-tool (By Google)
3. Selecting a published structured and unstructured dataset. The main goal of the project is to mitigate bias in the structured (tabular) dataset. If possible, we will extend our bias analysis to the unstructured data such as text and image.
• Tabular Dataset: TitanicSexism (fairness in ML), https://www.kaggle.com/code/garethjns/titanicsexism-fairness-in-ml/input
• Text Dataset: Fake and real news dataset, https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset
• Imaged Dataset: UTKFace, https://www.kaggle.com/datasets/jangedoo/utkface-new
4. Discussing the possible mitigation algorithms that can be used. Mitigation algorithms should be implemented in pre-processing, in-processing, and post-processing. Below is an example of the mitigation algorithms that will be used.
• Fairlearn: ExponentiatedGradient, GridSearch, ThresholdOptimizer, CorrelationRemover, AdversarialFairnessClassifier, AdversarialFairnessRegressor.
• AIF360: preprocessing (Disparate Impact Remover, LFR, Optim Preproc, Reweighing), inprocessing (Adversarial Debiasing, ART Classifier, Gerry Fair Classifier, Meta Fair Classifier, Prejudice Remover, Exponentiated Gradient Reduction, GridSearch Reduction), postprocessing (Calibrated EqOdds Postprocessing, EqOdds Postprocessing, Reject Option Classification).
• What-If-Tool: It is still under study
5. Discussing the results and summarizing the comparison among the libraries. In the discussion, we will compare the performance of the mitigation algorithms in different stage of the ML life cycle such as preprocessing, inprocessing, and postprocessing.

Project Deliverables {Empty}

Student Research Computing Facilitator Profile PI has an undergraduate student they would like to work with them on this project.

Mentee Research Computing Profile {Empty}

Student Facilitator Programming Skill Level {Empty}

Mentee Programming Skill Level {Empty}

Project Institution {Empty}

Project Address 6127 Galleon Dr
Mechanicsburg, Pennsylvania. 17050

Anchor Institution CR-Penn State

Preferred Start Date {Empty}

Start as soon as possible. No

Project Urgency Already behind3Start date is flexible

Expected Project Duration (in months) {Empty}

Launch Presentation {Empty}

Launch Presentation Date {Empty}

Wrap Presentation

wrap presentation.pdf (1.75 MB)

Wrap Presentation Date 03/22/2024

Project Milestones

Milestone Title: Survey basics
Milestone Description: Surveying the basics of bias and fairness in machine learning. The students will learn the basics from the two review articles “A Survey on Bias and Fairness in Machine Learning” by NINAREH MEHRABI, FRED MORSTATTER, NRIPSUTA SAXENA, KRISTINA LERMAN, and ARAM GALSTYAN, and “An Introduction to Algorithmic Fairness” arXiv:2105.05595v1 [cs.CY] by Hilde J.P. Weerts.
Completion Date Goal: 2023-10-26
Actual Completion Date: 2023-11-13
Milestone Title: Select libraries
Milestone Description: Choosing the proper fairness metrics to identify the bias. Below is an example of the metrics that will be used in each library.
• Fairlearn: Demographic parity, Equalized odds, Equal opportunity
• AIF360: Dataset Metric, Binary Label Dataset Metric, Classification Metric, Sample Distortion Metric, MDSS Classification Metric.
• What-If-Tool: It is still under study
Completion Date Goal: 2023-11-02
Actual Completion Date: 2023-11-13
Milestone Title: Select dataset
Milestone Description: Selecting a published structured. The main goal of the project is to identify bias in the structured (tabular) dataset. If possible, we will extend our bias analysis to the unstructured data such as text and image.
• Tabular Dataset: TitanicSexism (fairness in ML), https://www.kaggle.com/code/garethjns/titanicsexism-fairness-in-ml/input
• Text Dataset: Fake and real news dataset, https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset
• Imaged Dataset: UTKFace, https://www.kaggle.com/datasets/jangedoo/utkface-new
Completion Date Goal: 2023-11-09
Actual Completion Date: 2023-11-13
Milestone Title: Choose fairness metrics
Milestone Description: Choosing the proper fairness metrics to identify the bias. Below is an example of the metrics that will be used in each library.
• Fairlearn: ExponentiatedGradient, GridSearch, ThresholdOptimizer, CorrelationRemover, AdversarialFairnessClassifier, AdversarialFairnessRegressor.
• AIF360: preprocessing (Disparate Impact Remover, LFR, Optim Preproc, Reweighing), inprocessing (Adversarial Debiasing, ART Classifier, Gerry Fair Classifier, Meta Fair Classifier, Prejudice Remover, Exponentiated Gradient Reduction, GridSearch Reduction), postprocessing (Calibrated EqOdds Postprocessing, EqOdds Postprocessing, Reject Option Classification).
• What-If-Tool: It is still under study

Completion Date Goal: 2023-12-07
Milestone Title: Identify bias in dataset
Milestone Description: Employ the selected libraries on the dataset and identify bias
Completion Date Goal: 2023-12-22
Milestone Title: Discussing results and summarizing comparison
Milestone Description: Discussing the results and summarizing the comparison among the libraries. In the discussion, we will compare the performance of the mitigation algorithms in different stage of the ML life cycle such as preprocessing, inprocessing, and postprocessing.

Completion Date Goal: 2024-01-18

Github Contributions {Empty}

Planned Portal Contributions (if any) {Empty}

Planned Publications (if any) {Empty}

What will the student learn? {Empty}

What will the mentee learn? {Empty}

What will the Cyberteam program learn from this project? {Empty}

HPC resources needed to complete this project? {Empty}

Notes Mentor is needed - skills in machine learning and bias identification needed.

Final Report

What is the impact on the development of the principal discipline(s) of the project? {Empty}

What is the impact on other disciplines? {Empty}

Is there an impact physical resources that form infrastructure? {Empty}

Is there an impact on the development of human resources for research computing? {Empty}

Is there an impact on institutional resources that form infrastructure? {Empty}

Is there an impact on information resources that form infrastructure? {Empty}

Is there an impact on technology transfer? {Empty}

Is there an impact on society beyond science and technology? {Empty}

Lessons Learned {Empty}

Overall results Feedback from the PI: "Process was very smooth and professional"
Feedback from the student facilitator: "The project went better than he was expecting and he gained a lot of experience and learning. Didn't previously have knowledge or awareness of fairness when it came to machine learning."
Feedback from mentor: "Very relevant research, particularly for industry applications. Undergraduates doing hands-on research is a great experience. Structure and detailed milestones were very helpful."