Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language ModelsShengyun PengPin-Yu Chenet al.2024NeurIPS 2024
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative ModelsZhaitang LiPin-Yu Chenet al.2024NeurIPS 2024
Distributional Preference Alignment of LLMs via Optimal TransportIgor MelnykYoussef Mrouehet al.2024NeurIPS 2024
Graph-based Uncertainty Metrics for Long-form Language Model GenerationsMingjian JiangYangjun Yangjunet al.2024NeurIPS 2024
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from WikipediaYufang HouAlessandra Pascaleet al.2024NeurIPS 2024
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language ModelsChia-yi HsuYu-Lin Tsaiet al.2024NeurIPS 2024
Interpolating Item and User Fairness in Multi-Sided RecommendationsQinyi ChenJason Cheuk Nam Lianget al.2024NeurIPS 2024
Nonlocal Attention Operator: Materializing Hidden Knowledge Towards Interpretable Physics DiscoveryYue YuNing Liuet al.2024NeurIPS 2024