redgrep: from regular expression derivatives to LLVM
redgrep is a grep based on regular expression derivatives. That is, it uses regular expression derivatives to construct the DFA†. It then uses LLVM to JIT‡ the DFA.
Since regular expression derivatives permit the three basic Boolean operations of disjunction (|), conjunction (&) and complement (!), redgrep enables you to write very powerful regular expressions very easily and guarantees to match them in linear time.
In this talk, we will cover how regular expression derivatives work, then dive into their implementation in redgrep. Resurfacing after a look at code generated by LLVM, we will conclude with benchmarks.
† Deterministic Finite Automaton
‡ Just-In-Time compile
Paul is a systems engineer at Google. He currently works on App Engine, but spent a lot of 20% time this year tinkering with regular expressions.