Short paper
Aligners: Decoupling LLMs and Alignment
Lilian Ngweta, Mayank Agarwal, et al.
ICLR 2024
We consider the task of auditing ML models for individual bias/unfairness. We formalize the task in an optimization problem and develop a suite of inference tools for the optimal value. Our tools permit us to obtain asymptotic confidence intervals and hypothesis tests that cover the target/control the Type I error rate exactly. To demonstrate the utility of our tools, we use our them to reveal the gender and racial biases in Northpointe's COMPAS recidivism prediction instrument.
Lilian Ngweta, Mayank Agarwal, et al.
ICLR 2024
Igor Melnyk, Youssef Mroueh, et al.
NeurIPS 2024
Momin Abbas, Muneeza Azmat, et al.
ICLR 2025
Debarghya Mukherjee, Felix Petersen, et al.
NeurIPS 2022