Publications

68 results for AI Testing

Towards a Benchmark for Causal Business Process Reasoning with LLMs
- - Fabiana Fournier
  - Lior Limonad
  - et al.
- 2024
- BPM 2024
Why Don't Prompt-Based Fairness Metrics Correlate?
- - Abdelrahman Zayed
  - Gonçalo Mordido
  - et al.
- 2024
- ACL 2024
Data Contamination Report from the 2024 CONDA Shared Task
- - Oscar Sainz
  - Iker García-ferrero
  - et al.
- 2024
- ACL 2024
Risk Aware Benchmarking of Large Language Models
- - Apoorva Nitsure
  - Youssef Mroueh
  - et al.
- 2024
- ICML 2024
Towards Assurance of LLM Adversarial Robustness using Ontology-Driven Argumentation
- - Tomas Bueno Momcilovic
  - Beat Buesser
  - et al.
- 2024
- xAI 2024
Exploring Vulnerabilities in LLMs: A Red Teaming Approach to Evaluate Social Bias
- - Yuya Jeremy Ong
  - Jay Pankaj Gala
  - et al.
- 2024
- IEEE CISOSE 2024
Navigating the Modern Evaluation Landscape: Considerations in Benchmarks and Frameworks for Large Language Models (LLMs)
- - Leshem Choshen
  - Ariel Gera
  - et al.
- 2024
- LREC-COLING 2024
Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?
- - Yu-Lin Tsai
  - Chia-yi Hsu
  - et al.
- 2024
- ICLR 2024
Can LLMs Fix Issues with Reasoning Models? Towards More Likely Models for AI Planning
- - Turgay Caglar
  - Sirine Belhaj
  - et al.
- 2024
- AAAI 2024
Human Evaluation of the Usefulness of Fine-Tuned English Translators for the Guarani Mbya and Nheengatu Indigenous Languages
- - Claudio Santos Pinhanez
  - Paulo Rodrigo Cavalin
  - et al.
- 2024
- PROPOR 2024