Sanskrit sandhi splitting using Seq2(Seq)22
Rahul Aralikatte, Neelamadhav Gantayat, et al.
EMNLP 2018
We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets§. The resulting corpus (emrQA) has 1 million questions-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping.
Rahul Aralikatte, Neelamadhav Gantayat, et al.
EMNLP 2018
Shivashankar Subramanian, Ioana Baldini, et al.
IAAI 2020
Gabriele Picco, Lam Thanh Hoang, et al.
EMNLP 2021
Kevin Gu, Eva Tuecke, et al.
ICML 2024