Xinyi Su, Guangyu He, et al.
Dianli Xitong Zidonghua/Automation of Electric Power Systems
Regular expressions (shortened as regexp ) are widely used to parse data, detect recurrent patterns and information, and are a common choice for defining configurable rules for a variety of systems. In fact, many data-intensive applications rely on regexp matching as the first line of defense to perform on-line data filtering. Unfortunately, few solutions can keep up with the increasing data rate and complexity of sets containing hundreds of expressions. In this paper we present DotStar (.*), a complete algorithmic solution and a software tool-chain, that can compile large sets of regexp into an automaton that can take advantage of the vector/SIMD extensions available on many commodity multi-core processors. DotStar relies on several algorithmic innovations to transform the user-provided regexp set into a sequence of manageable intermediate representations. The resulting automaton is both space and time efficient, and can search in a single pass without backtracking. The experimental evaluation, performed on a family of state-of-the-art processors, shows that DotStar can efficiently handle both small sets of regexp, used in protocol parsing, and larger sets designed for Network Intrusion Detection Systems (NIDS), achieving a performance between 1 and 5 Gbit/sec per core. © 2010 Springer-Verlag.
Xinyi Su, Guangyu He, et al.
Dianli Xitong Zidonghua/Automation of Electric Power Systems
Sonia Cafieri, Jon Lee, et al.
Journal of Global Optimization
S.F. Fan, W.B. Yun, et al.
Proceedings of SPIE 1989
Arun Viswanathan, Nancy Feldman, et al.
IEEE Communications Magazine