Algorithm Audits

Sociotechnical systems driven by big data and (sometimes) machine learning are increasingly being deployed in the real world, in consequential scenarios like hiring, banking, access to governmental resources, and policing. These systems, whether intentionally or inadvertently, may exhibit problems like unfairness and bias that may negatively impact real people. One crucial mechanism towards achieving transparency and accountability around such systems is the algorithm audit: an independent evaluation of a system to ascertain whether (1) the purported functionality of the system matches it actual implementation, and (2) whether the system contains latent issues that may results in unfairness or bias.

Professor Wilson, his colleagues, and their students often conduct algorithm audits. While some of these audits are permissionless, i.e., conducted without the knowledge or permission of the audited company, others have been conducted in collaboration with willing companies.

On this page, we present materials related to our collaborative audits. To foster confidence in these audits and their results we strive for as much transparency as possible.


pymetrics is a startup that offers a candidate screening service (also known as pre-employment assessment) to employers based on data and applied machine learning. One of the core assertions pymetrics makes about their service is that they pro-actively de-bias machine learning models before deployment to comply with the U.S. Uniform Guidelines on Employee Selection Procedures (UGESP). pymetrics claims to use an outcome-based model de-biasing process where (1) candidate machine learning models are assessed for compliance with the UGESP four-fifths rule using minimum bias ratio as a metric and (2) models are retrained as necessary until a compliant model is identified.

Professor Wilson and his team at Northeastern audited the pymetrics candidate screening service in summer 2020. The primary focus of the audit was determining whether pymetrics' source code faithfully implemented the four-fifths rule via the minimum bias ratio metric using the process described by pymetrics in their documentation. Additionally, we examined whether pymetrics' adverse impact tests could be avoided by crafting malicious inputs to their system; whether their system had built-in safeguards to prevent human error from subverting fairness guarantees; if any assumptions about data preparation and/or cleaning in their source code negatively impacted the purported fairness guarantees of the system; and whether pymetrics' system exhibited direct discrimination by using demographic data for model training. To complete the audit, pymetrics gave the audit team access to the documentation and source code for their candidate screening service, as well as representative datasets. pymetrics was not informed ahead of time what tests the audit team planned to perform.

The following links document the agreements that were made between Professor Wilson, Northeastern University, and pymetrics prior to the start of the audit. Additionally, we provide a (lightly redacted) version of the final audit report that the audit team authored and presented to pymetrics at the conclusion of the audit.