Debatable Intelligence: Benchmarking LLM Judges via Debate Speech EvaluationNoy SternlichtAriel Geraet al.2025EMNLP 2025
Navigating the Modern Evaluation Landscape: Considerations in Benchmarks and Frameworks for Large Language Models (LLMs)Leshem ChoshenAriel Geraet al.2024LREC-COLING 2024
Label Sleuth: From Unlabeled Text to a Classifier in a Few HoursEyal ShnarchAlon Halfonet al.2022EMNLP 2022