Erik Miehling, Manish Nagireddy, et al.
EMNLP 2024
Humanitarian relief agencies must assess humanitarian crises occurring in the world to prioritize the aid that can be offered. While the rapidly growing availability of relevant information enables better decisions to be made, it also creates an important challenge: How to find, collect, and categorize this information in a timely manner. To address the problem, we propose a targeted retrieval system that automates these tasks. The system uses historical data collected and labeled by subject matter experts to train a classifier that identifies relevant content. Using this classifier, it deploys a focused crawler to locate and retrieve data at scale. The system also incorporates feedback from subject matter experts to adapt to new concepts and information sources. A novel component of the system is an algorithm for re-crawling that improves the crawler efficiency in retrieving recent data. Our preliminary result shows that the algorithm can increase the freshness of collected data while simultaneously decreasing crawling effort. Furthermore, we show that focused crawling outperforms general crawling in this domain. Our initial prototype has received positive feedback from analysts at the Assessment Capacities Project, a humanitarian response agency.
Erik Miehling, Manish Nagireddy, et al.
EMNLP 2024
Hussein Mozannar, Hunter Lang, et al.
AISTATS 2023
Hussein Mozannar, Valerie Chen, et al.
TMLR
Amit Dhurandhar, Swagatam Haldar, et al.
ICML 2024