Compiling text analytics queries to FPGAs
Raphael Polig, Kubilay Atasu, et al.
FPL 2014
Information Extraction (IE)-the problem of extracting structured information from unstructured text - has become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE systems are widely used in practice due to their well-known "explainability," developing high-quality information extraction rules is known to be a labor-intensive and time-consuming iterative process. Our demonstration showcases SystemT IDE, the integrated development environment for SystemT, a state-of-the-art rule-based IE system from IBMResearch that has been successfully embedded in multiple IBM enterprise products. SystemT IDE facilitates the development, test and analysis of high-quality IE rules by means of sophisticated techniques, ranging from data management to machine learning. We show how to build high-quality IE annotators using a suite of tools provided by SystemT IDE, including computing data provenance, learning basic features such as regular expressions and dictionaries, and automatically refining rules based on labeled examples. © 2011 Authors.
Raphael Polig, Kubilay Atasu, et al.
FPL 2014
Laura Chiticariu, Rajasekar Krishnamurthy, et al.
ACL 2010
Raphael Polig, Kubilay Atasu, et al.
IEEE Micro
Laura Chiticariu, Marina Danilevsky, et al.
NAACL 2018