The NorthPole Validator: A Cycle-Accurate Simulator for HW/SW Codesign of a Prescheduled Neural Inference AcceleratorAlexander AndreopoulosMichael V. Deboleet al.2025HPEC 2025
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?Giacomo CamposampieroMichael Herscheet al.2025NeSy 2025
StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional EvaluationSatyananda KashyapSola Shiraiet al.2025VLDB 2025
Evaluating LLM-based Agents: Foundations, Best Practices and Open ChallengesRoy Bar-HaimArman Cohanet al.2025IJCAI 2025
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language ModelsGeorge KourItay Nakashet al.2025ACL 2025
DOES YOUR MODEL UNDERSTAND GENES? A MODALITY-AGNOSTIC BENCHMARK OF GENE PROPERTIESYoav Kan-TorMichael Morris Danzigeret al.2025ISMB 2025
Exploring Straightforward Methods for Automatic Conversational Red-TeamingGeorge KourNaama Zwerdlinget al.2025NAACL 2025