Sociotechnical systems driven by big data and (sometimes) machine learning are increasingly being deployed in the real world, in consequential scenarios like hiring, banking, access to governmental resources, and policing. These systems, whether intentionally or inadvertently, may exhibit problems like unfairness and bias that may negatively impact real people. One crucial mechanism towards achieving transparency and accountability around such systems is the algorithm audit: an independent evaluation of a system to ascertain whether (1) the purported functionality of the system matches it actual implementation, and (2) whether the system contains latent issues that may results in unfairness or bias.
Professor Wilson, his colleagues, and their students often conduct algorithm audits. While some of these audits are permissionless, i.e., conducted without the knowledge or permission of the audited company, others have been conducted in collaboration with willing companies.
On this page, we present materials related to our collaborative audits. To foster confidence in these audits and their results we strive for as much transparency as possible.
pymetrics is a startup that offers a candidate screening service (also known as pre-employment assessment) to employers based on data and applied machine learning. One of the core assertions pymetrics makes about their service is that they pro-actively de-bias machine learning models before deployment to comply with the U.S. Uniform Guidelines on Employee Selection Procedures (UGESP). pymetrics claims to use an outcome-based model de-biasing process where (1) candidate machine learning models are assessed for compliance with the UGESP four-fifths rule using minimum bias ratio as a metric and (2) models are retrained as necessary until a compliant model is identified.
Professor Wilson and his team at Northeastern audited the pymetrics candidate screening service in summer 2020. The primary focus of the audit was determining whether pymetrics' source code faithfully implemented the four-fifths rule via the minimum bias ratio metric using the process described by pymetrics in their documentation. Additionally, we examined whether pymetrics' adverse impact tests could be avoided by crafting malicious inputs to their system; whether their system had built-in safeguards to prevent human error from subverting fairness guarantees; if any assumptions about data preparation and/or cleaning in their source code negatively impacted the purported fairness guarantees of the system; and whether pymetrics' system exhibited direct discrimination by using demographic data for model training. To complete the audit, pymetrics gave the audit team access to the documentation and source code for their candidate screening service, as well as representative datasets. pymetrics was not informed ahead of time what tests the audit team planned to perform.
The following links document the agreements that were made between Professor Wilson, Northeastern University, and pymetrics prior to the start of the audit.
- Non-compete. Professor Wilson signed a non-compete agreement stating that he would not accept employment with any competitor of pymetrics. Access to proprietary pymetrics source code and data were predicated on signing this agreement.
- Sponsored Research Agreement and Work Plan. This audit was structured as a sponsored research project. pymetrics and Northeastern signed a contract and pymetrics paid Northeastern, with the money being earmarked to support the audit team. The contract specifies what pymetrics intellectual property was considered confidential; it also specifies that the audit team reserved the right to publish the results of the audit publicly. The work plan presents a rough timeline and process for the audit. Note that the work plan was written before the COVID-19 pandemic caused shutdowns - as such, it refers to in-person meetings that were instead conducted remotely.
- Budget. The budget presents the approximate expenditures of the audit team.