PropBank goes Public: Incorporation into Wikidata
Elizabeth Spaulding, Kathryn Conger, et al.
EACL 2024
Although semantic role labeling based on a pre-existing valency lexicon is useful in many downstream natural language processing (NLP) tasks, computational versions of these types of lexicons are not available in most languages. For parallel sentences where the source language already has automatic or manual lexical (SRL) annotation, the annotations can be transferred onto the target language text. The Universal PropBanks (UP2.0) system uses unsupervised word alignments, filtering heuristics, and bootstrapping to automatically project English SRL annotations to twenty-three languages. Since this approach uses English-based semantic representations to create annotations in other languages, it may miss language-specific nuances. We provide a case study using English PropBank representations and the Russian PropBank. We evaluate the UP2.0 projection of English annotations onto sentences that have been annotated manually with Russian PropBank, assessing discrepancies. Based on our error analysis and PropBank annotation guidelines, we identify language-specific and language-independent principles that may be violated by the automatic projection and implement these as post-processing filters to improve the precision of the automatic annotation projection.
Elizabeth Spaulding, Kathryn Conger, et al.
EACL 2024
Fei Xia, Martha Palmer, et al.
Computational Intelligence