Benjamin Anderson
Buy Me a Coffee at

Machine Learning Projects

TrollSpotting: Identifying Tweets from Russian Trolls

CS229 Final Project — Autumn 2020

Since the 2016 election, Twitter has identified thousands of accounts belonging to employees of the Internet Research Agency (IRA), a “troll factory” operating out of St. Petersburg, Russia. In this project, I test the classical bag-of-words approach to extract semantic content from Twitter posts, and compare it to modern techniques that leverage neural networks. I apply various machine learning algorithms to these features to distinguish Tweets written by ordinary users from Tweets written by trolls. [Paper]

Playing "Dominion" with Deep Reinforcement Learning

CS238 Final Project (with Garrick Fernandez) — Autumn 2019

In this project, we apply deep reinforcement learning to the multi-player card game “Dominion.” We trained an agent using the SARSA algorithm, combined with a global function approximation. After training through self-play, the RL agent was able to easily beat a computer player which selected a random action every turn. It is likely that the agent would become significantly better with more sophisticated feature extraction and more iterations of self-play. [Paper]

Interpretable Criminal Risk Assessment

CS221 Final Project (with Gaeun Kim) — Autumn 2019

Algorithmic criminal risk assessment has the potential to make the criminal justice system more fair, remove arbitrariness and human bias, and reduce incarceration by releasing people who are unlikely to re-offend. However, the current gold standard for criminal risk assessment (COMPAS) is severely lacking: an unaccountable and costly "black box" assessment which has also faced credible accusations of racial bias. In this project, we prototype several simple, interpretable algorithms for criminal risk assessment. We show that logistic regression, decision tree, and Bayesian network approaches all have comparable accuracy to the gold standard proprietary algorithm, while mitigating racial bias in classification error. [Paper]

Data Visualization Projects

Visual Explainer: The Affordable Care Act

CS448B Final Project — Autumn 2020

This interactive visual explainer walks the reader through the landscape of the American healthcare system since the Affordable Care Act was signed into law in 2010. Through maps and interactive charts, the reader is prompted to think about their own beliefs about how the Affordable Care Act changed the healthcare system, and then presented with the data and context. This project was build using D3, a Javascript library for web-based data visualization. [Interactive Demo] [Code]


CS448B Course Project — Autumn 2020

Originally developed by Harry Hochheiser and Ben Shneiderman, TimeSearcher is an interactive tool to filter and query time-series data. For a class project, I created a web-based TimeSearcher interface for the Stanford Cable News dataset in D3, a JavaScript library for data visualization in the browser. The interface allows the user to draw boxes to filter the dataset, as well as search for persons of interest by name. [Interactive Demo] [Code]

Other Projects

HiveMind Chess

Personal Project — Winter 2021

Chess interface that uses the open source Lichess database to select a move with probability proportional to its popularity, with the goal of simulating the experience of playing chess against a person. The game continues until a novel position is reached that is not in the database. [Interactive Demo] [Code]


CS110L Course Project — Spring 2020

GDB-like debugger written in Rust that uses ptrace to get information about a running program. Built in CS110L using starter code by Ryan Eberhardt and Armin Namavari. [Code]


CS110L Course Project — Spring 2020

Multithreaded reverse proxy / load balancer written in Rust. Uses asynchronous tokio library for nonblocking I/O. Implements passive and active health checks, and rate-limiting. Built in CS110L using starter code by Ryan Eberhardt and Armin Namavari. [Code]