Ruth Fong

I am a teaching professor in the Department of Computer Science at Princeton University, where I teach intro and AIML CS courses and conduct research in computer vision and machine learning with a focus on explainable AI. At Princeton, I lead the Looking Glass Lab and frequently collaborate with Professor Olga Russakovsky and the Visual AI Lab.

I completed my PhD in the Visual Geometry Group at the University of Oxford, where I was advised by Andrea Vedaldi and funded by the Rhodes Trust and Open Philanthropy. Also at Oxford, I earned a Masters in Neuroscience, where I worked with Rafal Bogacz, Ben Willmore, and Nicol Harper. I received a Bachelors in Computer Science at Harvard University, where I worked with David Cox and Walter Scheirer.

Funding acknowledgements: Looking Glass Lab is grateful to Princeton SEAS and Open Philanthropy for generous support of our research.

Email | CV | Bio | Google Scholar | GitHub

Hello! 👋

COS324. I'm excited to have you in my class this semester! Please see Canvas for the best ways to reach the course staff.

Research/IW/thesis advising. I enjoy working with Princeton students! Unfortunately, I am not accepting with new students this academic year (2023-2024). As such, there's no need to email me about research/IW/thesis opportunities.

  • For a list of most AIML faculty members in the COS department, see here.
  • For a list of all COS faculty who are available for IW/thesis advising, see here.
If you're a non-Princeton student (including prospective applicants), there's also no need to email; please reach out after you've been accepted to Princeton. Unfortunately, I am not accepting non-Princeton students at this time.

Other Princeton things. Engaging with students is one of my favorite parts of the job. If you'd like to reach me about other Princeton-related things (e.g. participating in a student event), shoot me an email!

News 🗞
  • Our CHI 2023 paper, "Help Me Help the AI": Understanding How Explaiinability Can Support Human-AI Interaction, received an honorable mention award 🏆! Congrats to Sunnie S. Y. Kim and our co-authors 🎉! arXiv | program link
  • Along with my co-editors, our book on "xxAI - Beyond Explainable AI" is now available: link
  • Along with my co-instructors, I introduced an open-ended final project to COS126 (i.e. Princeton's intro CS course), here's the online gallery of the amazing projects students created!
  • I am excited to announce that I am joining Princeton's CS department as a teaching faculty member starting July 2021.
  • My PhD thesis on "Understanding Convolutional Neural Networks" can be found here. For those with less experience, all chapters except chapters 3-6 were written with accessibility in mind.
  • We have a new report out on "Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims." webpage | arXiv
  • We just released TorchRay, a PyTorch interpretability library. In the initial release, we focus on attribution and re-implemented popular methods and benchmarks to encourage reproducible research. Resources: tutorial slides | colab notebook
Talks 👩🏻‍🏫
  • Explainability in Machine Learning workshop 2023 — "Directions in Interpretability": slides | workshop webpage
  • HEIBRiDS Lecture Series 2022 — "Directions in Interpretability": slides | series webpage
  • MICCAI 2022 — "Directions in Interpretability" at the iMIMIC workshop (Interpretability of Machine Intelligence in Medical Image Computing): slides | workshop webpage
  • CVPR 2022 — "Directions in Interpretability" at the Human-Centered AI for Computer Vision tutorial: video | slides | tutorial webpage
  • CVPR 2020 — "Understanding Deep Neural Networks" at the Interpretable Machine Learning for Computer Vision tutorial: video | slides | tutorial webpage
  • Oxford VGG 2019 — Interpretability tutorial: slides
Looking Glass Lab 👤


Adam Kelch Sai Rachumalla Indu Panigrahi


Nicole Meister Dora Zhao Sunnie S. Y. Kim Ryan Manzuk Vikram V. Ramaswamy Angelina Wang Dr. Elizabeth Anne Watkins Prof. Olga Russakovsky Prof. Andrés Monroy-Hernández Prof. Adam C. Maloof

[image attribution]


  • Sunnie S. Y. Kim, CHI Honorable Mention Paper Award, 2023.
  • Devon Ulrich, Tau Beta Pi, 2023.
  • Alexis Sursock, Sigma Xi, 2023.
  • Indu Panigrahi, Sigma Xi, 2023.
  • Indu Panigrahi, Outstanding Computer Science Senior Thesis Prize, 2023.
  • Indu Panigrahi, NSF Graduate Fellowship Award Honorable Mention, 2023.
  • Indu Panigrahi, Computing Research Association (CRA) Outstanding Undergraduate Research Award Nominee, 2022.
  • Indu Panigrahi, Outstanding Independent Work Award, 2022.
  • Indu Panigrahi, Princeton Research Day Orange & Black Undergraduate Presentation Award, 2022.
  • Ruth Fong, Open Philanthropy AI Fellowship AI Fellowship, 2018.
  • Ruth Fong, Rhodes Scholarship, 2015.


  • Creston Brooks '23, senior thesis, Optimizations towards AI-based Travel Recommendation (started CS MS at Princeton in 2023).
  • Alexis Sursock '23, senior thesis, Stravl: The World's First Large-Scale, AI-based Travel Designer.
  • Indu Panigrahi '23, senior thesis, A Semi-supervised Model for Fine-grain, Serial Image Instance Segmentation (started CS MS at Princeton in 2023).
  • Devon Ulrich '23, senior thesis, Investigating the Fairness of Computer Vision Models for Medical Imaging.
  • Icey Siyi '24 and Fatima Zohra Boumhaout '24, research (summer 2022), Interactive Perturbation Visualization Tool.
  • Frelicia Tucker '22, senior thesis, The Virtual Black Hair Experience: Evaluating Hairstyle Transform Generative Adversarial Networks on Black Women.
  • Vedant Dhopte '22, senior thesis, Holistically Interpreting Deep Neural Networks via Channel Ablation.
Research 🧪

My research interests are in computer vision, machine learning (ML), and human-computer interaction (HCI), with a particular focus on explainable AI and ML fairness. Most of my work focuses on developing novel techniques for understanding AI models post-hoc, designing new AI models that are interpretable-by-design, and/or introducing paradigms for finding and correcting existing failure points in AI models. See Google Scholar for the most updated list of papers.

* denotes equal contribution; ^ denotes peer-reviewed, non-archival work (e.g. accepted to non-archival workshop).

Humans, AI, and Context: Understanding End-Users' Trust in a Real-World Computer Vision Application
Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, Andrés Monroy-Hernández
FAccT, 2023
arXiv | project page | bibtex

We study how end-users trust AI in a real-world context. Concretely, we describe multiple aspects of trust in AI and how human, AI, and context-related factors influence each.

UFO splash image UFO: A Unified Method for Controlling Understandability and Faithfulness Objectives in Concept-based Explanations for CNNs
Vikram V. Ramaswamy, Sunnie S. Y. Kim, Ruth Fong, Olga Russakovsky
arXiv, 2023  
arXiv | bibtex

We introduce a novel concept-based explanation framework for CNNs: UFO, which is a method for controlling the understandability and faithfulness of concept-based explanations using well-defined objective functions for the two qualities.

fossil image Improving Data-Efficient Fossil Segmentation via Model Editing
Indu Panigrahi, Ryan Manzuk, Adam Maloof, Ruth Fong
CVPR Workshop on Learning with Limited Labelled Data for Image and Video Understanding, 2023 
arXiv | bibtex

We explore how to improve a model for segmenting coral reef fossils by first understanding its systematic failures and second ``editing'' the model to mitigate said failures.

"Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction
Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, Andrés Monroy-Hernández
CHI, 2023 (Honorable Mention award 🏆)
arXiv | supp | 30-sec video | 10-min video | bibtex

We explore how explainability can support human-AI interaction by interviewing 20 end-users of a real-world AI application. Specifically, we study (1) what XAI needs people have, (2) how people intend to use XAI explanations, and (3) how people perceive existing XAI methods.

ELUDE image Overlooked Factors in Concept-based Explanations: Dataset Choice, Concept Salience, and Human Capability
Vikram V. Ramaswamy, Sunnie S. Y. Kim, Ruth Fong, Olga Russakovsky
CVPR, 2023 
arXiv | bibtex

We analyze three commonly overlooked factors in concept-based explanations, (1) the choice of the probe dataset, (2) the saliency of concepts in the probe dataset, (3) the number of concepts used in explanations, and make suggestions for future development and analysis of concept-based interpretability methods.

animated gif of interactive search
static image of interactive search
Interactive Visual Feature Search^
Devon Ulrich and Ruth Fong
NeurIPS Workshop on XAI in Action: Past, Present, and Future Applications, 2023 
arXiv | code | bibtex

We present an interactive visualization tool that allows you to perform a reverse image search for similar image regions using intermediate activations.

difference in avg pose between labelled female and male images
difference in avg color between labelled female and male images
Gender Artifacts in Visual Datasets
Nicole Meister*, Dora Zhao*, Angelina Wang, Vikram V. Ramaswamy, Ruth Fong, Olga Russakovsky
ICCV, 2023 
arXiv | project page | bibtex

We demonstrate the pervasive-ness of gender artifacts in popular computer vision datasets (e.g. COCO and OpenImages). We find that all of the following (and more) are gender artifacts: the mean value of color channels (i.e. mean RGB), the pose and location of people, and most co-located objects.

ELUDE image ELUDE: Generating Interpretable Explanations via a Decomposition into Labelled and Unlabelled Features
Vikram V. Ramaswamy, Sunnie S. Y. Kim, Nicole Meister, Ruth Fong, Olga Russakovsky
arXiv, 2022 
arXiv | bibtex

We present ELUDE, a novel explanation framework that decomposes a model's prediction into two components: 1. using labelled, semantic attributes (e.g. fur, paw, etc.) and 2. using an unlabelled, low-rank feature space.

HIVE image HIVE: Evaluating the Human Interpretability of Visual Explanations
Sunnie S. Y. Kim, Nicole Meister, Vikram V. Ramaswamy, Ruth Fong, Olga Russakovsky
ECCV, 2022 
arXiv | project page | extended abstract | code | 2-min video | bibtex

We introduce HIVE, a novel human evaluation framework for diverse interpretability methods in computer vision, and develop metrics that measure achievement on two desiderata for explanations used to assist human decision making: (1) Explanations should allow users to distinguish between correct and incorrect predictions. (2) Explanations should be understandable to users.

interactive similarity example #2
interactive similarity example #1
Interactive Similarity Overlays^
Ruth Fong, Alexander Mordvintsev, Andrea Vedaldi, Chris Olah
VISxAI, 2021 
interactive article | code | bibtex

We introduce a novel interactive visualization that allows machine learning practitioners and researchers to easily observe, explore, and compare how a neural network perceives different image regions.

hierarchical transformations On Compositions of Transformations in Contrastive Self-Supervised Learning
Mandela Patrick*, Yuki M. Asano*, Polina Kuznetsova, Ruth Fong, João F. Henriques, Geoffrey Zweig, and Andrea Vedaldi
ICCV, 2021 
arXiv | code | bibtex

We give transformations the prominence they deserve by introducing a systematic framework suitable for contrastive learning. SOTA video representation learning by learning (in)variances systematically.

describability metric diagram
describability metric diagram
Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning
Iro Laina, Ruth Fong, and Andrea Vedaldi
NeurIPS, 2020
arxiv | supp | bibtex

We introduce two novel human evaluation metrics for quantifying for evaluating the interpretability of clusters discovered via self-supervised methods. We also outline how to partially approximate one of the metrics using a group captioning model.

Debiasing Convolutional Neural Networks via Meta Orthogonalization^
Kurtis Evan David, Qiang Liu, and Ruth Fong
NeurIPS Workshop on Algorithmic Fairness through the Lens of Causality and Interpretability (AFCI), 2020
arxiv | supp | poster | bibtex

We introduce a novel paradigm for debiasing CNNs by encouraging salient concept vectors to orthogonal to class vectors in the activation space of an intermediate CNN layer (e.g., orthogonalizing gender and oven concepts in conv5).

Contextual Semantic Interpretability
Diego Marcos, Ruth Fong, Sylvain Lobry, Rémi Flamary, Nicolas Courty, and Devis Tuia
ACCV, 2020
arxiv | supp | code | bibtex

We introduce an interpretable-by-design machine vision model that learns to sparse groupings of interpretable concepts and demonstrate the utility of our novel architecture on scenicness prediction.

There and Back Again: Revisiting Backpropagation Saliency Methods
Sylvestre-Alvise Rebuffi*, Ruth Fong*, Xu Ji*, and Andrea Vedaldi
CVPR, 2020
arxiv | code | bibtex

We outline a novel framework that unifies many backpropagation saliency methods. Furthermore, we introduce NormGrad, a saliency method that considers the spatial contribution of the gradients of convolutional weights. We also systematically study the effects of combining saliency maps at different layers. Finally, we introduce a class-sensitivity metric and a meta-learning inspired technique that can be applied to any saliency method to improve class sensitivity.

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
Miles Brundage*, Shahar Avin*, Jasmine Wang*, Haydn Belfield*, Gretchen Krueger*, … ,
Ruth Fong, et al.
arXiv, 2020
arxiv | project page | bibtex

This report suggests various steps that different stakeholders can take to make it easier to verify claims made about AI systems and their associated development processes. The authors believe the implementation of such mechanisms can help make progress on one component of the multifaceted problem of ensuring that AI development is conducted in a trustworthy fashion.

Understanding Deep Networks via Extremal Perturbations and Smooth Masks
Ruth Fong*, Mandela Patrick*, and Andrea Vedaldi
ICCV, 2019 (Oral)
arxiv | supp | poster | code (TorchRay) | 4-min video | bibtex

We introduce extremal perturbations, an novel attribution method that highlights "where" a model is "looking." We improve upon Fong and Vedaldi, 2017 by separating out regularization on the size and smoothness of a perturbation mask from the attribution objective of learning a mask that maximally affects a model's output; we also extend our work to intermediate channel representations.

Occlusions for Effective Data Augmentation in Image Classification
Ruth Fong and Andrea Vedaldi
ICCV Workshop on Interpreting and Explaining Visual Artificial Intelligence Models, 2019
paper | bibtex | code (coming soon)

We introduce a simple paradigm based on batch augmentation for leveraging input-level occlusions (both stochastic and saliency-based) to improve ImageNet image classification. We also demonstrate the necessary of batch augmentation and quantify the robustness of different CNN architectures to occlusion via ablation studies.

Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks
Ruth Fong and Andrea Vedaldi
CVPR, 2018 (Spotlight)
arxiv | supp | bibtex | code | 4-min video | slides

Investigating how human-interpretable visual concepts (i.e., textures, objects, etc.) are encoded across hidden units of a convolutional neural network (CNN) layer as well as across CNN layers.

Using Human Brain Activity to Guide Machine Learning
Ruth Fong, Walter Scheirer, and David Cox
Scientific Reports, 2018
arxiv | supp | Harvard thesis | bibtex

We introduce a biologically-informed machine learning paradigm for object classification that biases models to better match the learned, internal representations of the visual cortex.

Interpretable Explanations of Black Box Algorithms by Meaningful Perturbation
Ruth Fong and Andrea Vedaldi
ICCV, 2017
arxiv | supp | bibtex | code | book chapter (extended) | chapter bibtex

We developed a theoretical framework for learning "explanations" of black box functions like CNNs as well as saliency methods for identifying "where" a computer vision algorithm is looking.

Theses 🎓

Understanding Convolutional Neural Networks
Ruth Fong (advised by Andrea Vedaldi)
Ph.D. Thesis

This is a "thesis-by-staples", so the novel parts are the non-paper chapters (i.e. all chapters except chapters 3-6), which I wrote with accessibility in mind (e.g., the ideal reader is a motivated undergraduate or graduate student looking to learn more about deep learning and interpretability). The introduction is accessible to a high-school student, and appendices A and B are primers on the relevant math concepts and convolutional neural networks respectively.

Modelling Blind Single Channel Sound Separation Using Predict Neural Networks
Ruth Fong (advised by Ben Willmore and Nicol Harper)
M.Sc. Thesis #2

I developed an unsupervised learning paradigm for sound separation using fully connected and recurrent neural networks to predict the future from past cochleagram data.

Optimizing Deep Brain Stimulation to Dampen Tremor
Ruth Fong (advised by Rafal Bogacz)
M.Sc. Thesis #1
Tutorial | Demo | MATLAB Rayleigh statistics toolbox

I developed a computational oscillator model that modeled the tremor-dampening effects of phasic deep brain stimulation and analyzed it on experimental data.

Leveraging Human Brain Activity to Improve Object Classification
Ruth Fong (advised by David Cox and Walter Scheirer)
A.B. Thesis

Published as Fong et al., Scientific Reports 2018.

Teaching 📝

Princeton COS324: Intro to Machine Learning — Fall 2022, Spring 2023, Fall 2023
Princeton COS126: CS: An Interdisciplinary Approach — Fall 2021 & Spring 2022
Oxford Engineering B14: Image and Signal Analysis — Fall 2019
NJ Governor's School: Mathematics in the World — Summer 2015
Harvard CS121: Intro to Theory of Computation — Fall 2014
Harvard CS20: Intro to Discrete Math — Spring 2014
Harvard CS50: Intro to CS I — Fall 2012

This ubiquitous CS researcher website template spawned from here.
Last updated: October 2023