Loading…
WiDS Puget Sound is independently organized by Diversity in Data Science.
Tuesday, May 14 • 2:40pm - 3:05pm
Reinforcement Learning for Model Bias Analysis

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

With the broad-scale application and massive growth of artificial intelligence (AI) and machine learning (ML) in all aspects of society, a question persists as to the robustness of these systems. In most cases, methods of investigating trustworthiness and explainability in ML models have focused on reactive methods designed to detect when a model has erred. These avenues of investigation are also limited to standard interrogation methods, which may be inadequate for sufficiently novel model architectures or data modalities. We, on the other hand, are developing a proactive method to anticipate possible failure states by simulating a unique and optimal adversarial attack using reinforcement learning (RL). We explore RL as a technique for evaluating model biases and robustness and propose an RL Optimizing Bias Elimination and Robustness Tool (ROBERT). The expected outcome of ROBERT is to learn how biases in a model can be exploited under potential adversarial attack.

In developing ROBERT, we train an image classification model on the MNIST dataset and construct an RL environment that perturbs input images which are then passed into this classification model. The reward of our system is designed to correlate with the impact of the perturbations on the model’s ability to correctly classify the image, with model error translating to higher reward, therefore teaching ROBERT the classifier model’s weaknesses. We validate ROBERT by means of a test wherein we train multiple image classification models with differing architectures and analyze ROBERT’s chosen actions to identify probable model biases. Additionally, we observe how extendible these methods are to the black box adversarial case, which requires less information from the model to perform a successful attack. In conducting this experiment, we develop a novel RL-based methodology aimed to identify unseen points of weakness and bias in existing image classification models.

Speakers
avatar for Rachel Wofford

Rachel Wofford

Data Scientist, PNNL
Rachel Wofford is a Data Scientist at PNNL. Her research and interests involve reinforcement learning, adversarial machine learning, and development of big data analytics in the radio frequency and cybersecurity domains. Rachel holds an MS from Oregon State University and a BS from... Read More →
avatar for Anastasiya Usenko

Anastasiya Usenko

Data Scientist, PNNL
Anastasiya Usenko is an early career data scientist in the field of applied deep learning research, with bachelors degrees in computer science and linguistics. At PNNL, she has worked with reinforcement learning, graph neural networks, and causal inference modeling, among others... Read More →


Tuesday May 14, 2024 2:40pm - 3:05pm PDT
Room 130, Student Center