Auditing Algorithms @ Northeastern

This site is the homepage for the Algorithm Auditing Research Group within the Khoury College of Computer Sciences at Northeastern University. Here, you will find explanations of and links to our work, as well as open-source data and code from our research.

Why Audit Algorithms?

Today, we are surrounded by algorithmic systems in our everyday life. Examples on the web include Google Search, which personalizes search results to try and surface more relevant content; Amazon and Netflix, which recommend products and media; and Facebook, which personalizes each user's news-feed to highlight engaging content. Algorithms are also increasingly appearing in real world contexts, like surge pricing for vehicles from Uber; predictive policing algorithms that attempt to infer where crimes will occur and who will commit them; and credit scoring systems that determine eligibility for loans and credit cards. The proliferation of algorithms is driven by the explosion of Big Data that is available about people's online and offline behavior.

Although there are many cases where algorithms are beneficial to users, scientists and regulators are concerned that they may also harm individuals. For example, sociologists and political scientists worry that online Filter Bubbles may create "echo chambers" that increase political polarization. Similarly, personalization on e-commerce sites can be used to implement price discrimination. Furthermore, algorithms may exhibit racial and gender discrimination if they are trained on biased datasets. As algorithmic system proliferate, the potential for (unintentional) harmful consequences to users increases.

If we are going to live in a society permeated by sophisticated algorithms, then it is imperative that we understand how these algorithms are being implemented, the data they are using, and the effect that they have on individuals. Below, you will find links to specific research projects that our group has undertaken to address these issues.

Search Engines, Maps, and Filter Bubbles

Personalization on Google Search

Billions of people around the world rely on Google Search as their gateway to information on the Web. In this project, we examine how Google Search personalizes results for users, and what types of search queries are more heavily personalized.

View Details

Geolocal Personalization

One of the most important features used by search engines to personalize content is the user's location. In this follow-up to our Google Search paper, we specifically focus on how location impacts search results.

View Details

International Borders on Maps

Online mapping services like Google and Bing Maps often personalize international borders of countries based on the location of the user viewing the map. In this study, we exhaustively catalog cases of border personalization around the world, including several instances that had never been documented before.

View Details

Suppressing SEME

Previous research has shown that politically biased search results can dramatically impact the voting preferences of undecided voters; the so-called Search Engine Manipulation Effect. In our work, we have studied the effectiveness of design interventions that reveal bias in search engine results to users.

View Details

Components on Google Search

The search results from Google Search contain far more than 10 blue links. Today, the results contain a range of additional "components" such as embedded maps, knowledge boxes, images and video, and tweets, just to name a few. In this study, we examine the prevalence of components in search results by rank, query, and due to personalization.

View Details

Partisanship of Google Search Results

In this work, we examine how the partisanship of Google Search results. We rely on an audience-based partisanship scoring method derived from URLs shared on Twitter by a virtual panel of US registered voters. Overall, we find no evidence to support the assertion that personalization creates "filter bubbles" on Google Search, but we do find significant variation by query, rank, and component.

View Details

Bias and Discrimination

Discrimination in the Gig-economy

Gig-economy websites often solicit ratings and reviews of workers from customers. This social feedback is critical for the success of workers, as it may influence the hiring decisions of future customers. However, as we show in this study, social feedback from customers can be gender biased against female workers, as well as racially biased against workers of color. Furthermore, we also observe that female and minority workers appear lower in search results, possibly due to the effects of biased social feedback.

Read the Paper

Gender and Academic Performance

Although a machine learning algorithm may achieve high-accuracy overall, its performance may be significantly worse for specific subpopulations of individuals, especially if they are under-represented in training data. In this study, we examine techniques for fairly and accurately predicting academic performance in gender-imbalanced environments (i.e. STEM courses).

Read the Paper

Gender Discrimination in Hiring

Many people upload their resumes to sites like Indeed, Monster.com, and CareerBuilder, in the hope that they will be contacted by a recruiter. In turn, recruiters use the search tools provided by these sites to locate candidates seeking jobs. However, if these resume search engines take demographic features into account when ranking candidates, this could lead systematic bias against the candidates who are ranked lower. In this study, we audited three major resume search engines to examine if their search results are individually fair and group fair with respect to candidates' gender.

Read the Paper

Equal Opportunity in the Gig-Economy

In our prior work, we found that real-world discrimination had an impact on the opportunities afforded to workers in online gig-economy marketplaces. In this follow up, we tracked the career paths of gig-economy workers over time to understand whether demographic factors like gender and race impacted their ability to launch a career and secure work in these marketplaces.

View Details

E-commerce and Marketplaces

Price Discrimination

On the web, it is possible for e-commerce sites to personalize the prices of products for each person, a phenomenon known as price discrimination. In this project, we measure personalization on major e-commerce and travel sites to identify cases of price discrimination, as well as a related technique called price steering.

View Details

Surge Pricing on Uber

Ridesharing services have become extremely popular, but they also use a controversial surge pricing algorithm to dynamically adjust prices. In this study, we examine Uber's surge pricing algorithm to understand how it works, how customers can avoid it, and whether it actually incentivizes drivers to change their behavior.

View Details

Algorithmic Pricing on Amazon

Amazon Marketplace is a competitive environment that pits third-party sellers against Amazon itself. In this study, we examine Amazon's Buy Box algorithm, and investigate the dynamic pricing algorithms adopted by sellers to adjust their prices in real-time.

View Details

Equitability in Vehicle for Hire Markets

Traditional taxi services, which are heavily regulated, must now compute with ridesharing services like Uber and Lyft that are essentially unregulated. This raises questions about the coverage of these services and competition between them: do they cover all areas of cities equally? Are prices higher, or availability lower, in historically distressed areas of the city? In this study, we compare Uber, Lyft, and taxi data in San Francisco and New York City to compare and contrast these competing services.

Read the Paper

Online Advertising and Tracking

Tracing Information Flows

It is common knowledge that trackers and advertisers collect information about people as they browse the web. However, most people do not realize that these companies collaborate with each other to increase the amount of data they can collect. In this study, we develop a methodology that can causally infer the data sharing relationships between online ad companies. This gives us an unprecedented ability to understand the roles that different companies play in the tracking ecosystem, as well as reason about the implications for users' privacy.

View Details

“Recommended for You”

Many news websites and blogs embed widgets from Content Recommendation Networks (CRNs) like Outbrain and Taboola. These are the boxes of links with headlines like “Around the Web”. In the past, CRNs have been criticized and fined for spreading spammy links, and not disclosing that many of their recommended links are actually paid advertisements. In this study, we survey the CRNs to determine if they are properly disclosing advertisements, and what kinds of ads they are promoting.

View Details

Diffusion of Tracking Data

The increasing shift towards Real Time Bidding (RTB) in the advertising industry has forced advertisers to collaborate more closely with one another through cookie matching. In this study, we introduce a novel graph representation, called an Inclusion graph, to model the impact of RTB on the diffusion of user tracking data in the advertising ecosystem. We also investigate the efficacy of ad and tracker blockers in terms of protecting users' privacy.

View Details

Transparency and Fraud

To mitigate domain spoofing, the Interactive Advertising Bureau (IAB) Tech Lab introduced the ads.txt standard in May 2017. In this study, we present a 15-month longitudinal, observational study of the ads.txt standard. We do this to understand (1) if it is helping ad buyers to combat domain spoofing and (2) whether the transparency offered by the standard can provide useful data to researchers and privacy advocates.

View Details

Do Not Sell My Personal Information

Online services covered by the California Consumer Privacy Act (CCPA) are required to provide a hyperlink on their homepage with the text “Do Not Sell My Personal Information”. Using longitudinal data crawled from the top 1 million websites, we examine which websites are including these links, whether the websites without the links are out of compliance with the law, how link adoption has changed over time, and how websites are choosing to present these links.

View Details

IAB US Privacy String Framework

In this study, we take a deep dive into the IAB CCPA Compliance Framework to measure end-to-end flows of consent information and better understand why opt-out signals are not being honored. We examine overall adoption of the framework, the flow of consent data from publishers to third parties and between third parties, and finally the reach of opt-out signals. Our results uncover numerous issues with the adoption and implementation of the framework that prevent users' consent choices from being honored.

View Details
Interested in getting involved in exciting research at Northeastern? Please visit Volunteer Science to find out how you can contribute!