Lin Qiao, Vijayshankar Raman, et al.
ICDE 2008
Traditional approaches to rule-based information extraction (IK) have primarily been based on regular expression grammars. However, these grammar-based systems have difficulty scaling to large data sets and large numbers of rules. Inspired by traditional database research, we propose an algebraic approach to rule-based IE that addresses these scalability issues through query optimization. The operators of our algebra are motivated by our experience in building several rule-based extraction programs over diverse data sets. We present the operators of our algebra and propose several optimization strategies motivated by the text-specific characteristics of our operators. Finally we validate the potential benefits of our approach by extensive experiments over real-world blog data. © 2008 IEEE.
Lin Qiao, Vijayshankar Raman, et al.
ICDE 2008
Junyi Xie, Jun Yang, et al.
ICDE 2008
Rajasekar Krishnamurthy, Raghav Kaushik, et al.
ICDE 2005
Ronald Fagin, Benny Kimelfeld, et al.
SIGMOD/PODS 2014