Removed SRE recursion in Python

SRE is the current regular expression engine used in Python, and the scheme currently in use is highly recursive in some cases. For example, check the result of the following expression, with the current interpreters:

>>> import re
>>> re.match('(x)*','x'*10000)
Traceback (most recent call last):
  File "", line 1, in ?
  File "/usr/lib/python2.3/sre.py", line 132, in match
    return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion limit exceeded

Now, my patch to remove the SRE recursion in Python was finally committed to the CVS. Here is the same example with the CVS HEAD, but using a string which is a hundred times larger:

>>> import re
>>> re.match('(x)*','x'*1000000)
<_sre.SRE_Match object at 0x300bb970>
>>> 

Unless some serious problem is found in the new algorithm, it will be available in Python 2.4.

This patch also includes the (?(id/name)yes-pattern|no-pattern) expression support I’ve written more than a year ago. This will try to match with yes-pattern if the group with given id or name exists, and with no-pattern if it doesn’t. |no-pattern is optional and can be omitted. For example, () is a poor email matching pattern, which will match with ” as well as ‘user@host.com’, but not with ‘

This entry was posted in C/C++, Patch, Python. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *