There is two ways to create regular expressions: use string or directly use the API.
Atom classes:
- RegexEmpty: empty regex (match nothing)
- RegexStart, RegexEnd, RegexDot: symbols ^, $ and .
- RegexString
- RegexRange: character range like [a-z] or [^0-9]
- RegexAnd
- RegexOr
- RegexRepeat
All classes are based on Regex class.
>>> from hachoir_regex import parse
>>> parse('')
<RegexEmpty ''>
>>> parse('abc')
<RegexString 'abc'>
>>> parse('[bc]d')
<RegexAnd '[bc]d'>
>>> parse('a(b|[cd]|(e|f))g')
<RegexAnd 'a[b-f]g'>
>>> parse('([a-z]|[b-])')
<RegexRange '[a-z-]'>
>>> parse('^^..$$')
<RegexAnd '^..$'>
>>> parse('chats?')
<RegexAnd 'chats?'>
>>> parse(' +abc')
<RegexAnd ' +abc'>
>>> from hachoir_regex import createString, createRange
>>> createString('')
<RegexEmpty ''>
>>> createString('abc')
<RegexString 'abc'>
>>> createRange('a', 'b', 'c')
<RegexRange '[a-c]'>
>>> createRange('a', 'b', 'c', exclude=True)
<RegexRange '[^a-c]'>
Convert to string:
>>> from hachoir_regex import createRange, createString
>>> str(createString('abc'))
'abc'
>>> repr(createString('abc'))
"<RegexString 'abc'>"
Operatiors "and" and "or":
>>> createString("bike") & createString("motor")
<RegexString 'bikemotor'>
>>> createString("bike") | createString("motor")
<RegexOr '(bike|motor)'>
You can also use operator "+", it's just an alias to a & b:
>>> createString("big ") + createString("bike")
<RegexString 'big bike'>
Compute minimum/maximum matched pattern:
>>> r=parse('(cat|horse)')
>>> r.minLength(), r.maxLength()
(3, 5)
The library includes many optimization to keep small and fast expressions.
Group prefix:
>>> createString("blue") | createString("brown")
<RegexAnd 'b(lue|rown)'>
>>> createString("moto") | parse("mot.")
<RegexAnd 'mot.'>
>>> parse("(ma|mb|mc)")
<RegexAnd 'm[a-c]'>
>>> parse("(maa|mbb|mcc)")
<RegexAnd 'm(aa|bb|cc)'>
Merge ranges:
>>> from hachoir_regex import createRange
>>> regex = createString("1") | createString("3"); regex
<RegexRange '[13]'>
>>> regex = regex | createRange("2"); regex
<RegexRange '[1-3]'>
>>> regex = regex | createString("0"); regex
<RegexRange '[0-3]'>
>>> regex = regex | createRange("5", "6"); regex
<RegexRange '[0-356]'>
>>> regex = regex | createRange("4"); regex
<RegexRange '[0-6]'>
Use PatternMaching if you would like to find many strings or regex in a string. Use addString() and addRegex() to add your patterns.
>>> from hachoir_regex import PatternMatching
>>> p = PatternMatching()
>>> p.addString("a")
>>> p.addString("b")
>>> p.addRegex("[cd]")
And then use search() to find all patterns:
>>> for start, end, item in p.search("a b c d"):
... print "%s..%s: %s" % (start, end, item)
...
0..1: a
2..3: b
4..5: [cd]
6..7: [cd]
Item is a Pattern object, not the matched string. To be exact, it's a StringPattern for string and a RegexPattern for regex. You can associate an "user" value to each Pattern object.
>>> p2 = PatternMatching()
>>> p2.addString("un", 1)
>>> p2.addString("deux", 2)
>>> p2.addRegex("(trois|three)", 3)
>>> for start, end, item in p2.search("un deux trois"):
... print "%r at %s: user=%r" % (item, start, item.user)
...
<StringPattern 'un'> at 0: user=1
<StringPattern 'deux'> at 3: user=2
<RegexPattern 't(rois|hree)'> at 8: user=3
You can associate any Python object to an item, not only an integer!