Last updated on October 1, 2016
To match an arbitrary number range in a regular expression, is to match a string with a specific pattern which contains a number range. For example, to match with a pattern test{0..100}
, where {0..100}
denotes an integer not smaller than 0 and not larger 100, is such a case.
Although we already have some solutions to match a number range in regular expression, but they all make lengthy string and may lead to possible performance issue. In this post, I will provide an alternative solution to this problem, but it requires you to have control over the captured groups of the regex.
The basic idea is very simple: replace anywhere in the string where you need to match a number range with a capturing group which matches any numbers you are interested (e.g. integers, real numbers, etc.), and test whether they are in the number range later. A python example is shown below. The code snippet matches a{number}b
, where {number}
is an integer within a the range of small_int
and large_int
.
import re
matched = True
integer_regex = r'[\+\-]?[0-9]+'
matching_obj = re.match('a(' + integer_regex + ')b', string)
if matching_obj == None:
raise Exception('No matching')
for num in matching_obj.groups():
if float(num) < small_int or float(num) > large_int:
raise Exception('No matching')
# successfully matched if we reach here
However, the code above is a bit stiff: each time you need to match a new string you have to repeat all the code. Another more flexible version is shown below (for Python >= 3.4):
def match_number_range(string, pattern):
"""
Matches a string with pattern. The pattern can contain {num1..num2} to express the a range of numbers. The match is
successful if no exception is raised. Note the code won't work if you have any capturing groups in it.
string: the string to match
pattern: the pattern to be matched
"""
import re
integer_regex = r'[\+\-]?[0-9]+'
# extract all the number pairs (number ranges)
number_pairs = list(map(lambda x: (float(x[0]), float(x[1])), re.findall(r'(?<!\\){{({})..({})}}'.format(integer_regex, integer_regex), pattern)))
# replace all occurence of {num1..num2} by integer numbers
new_pattern = re.sub(r'(?<!\\){{{}..{}}}'.format(integer_regex, integer_regex), r'({})'.format(integer_regex), pattern)
matching_obj = re.fullmatch(new_pattern, string)
if matching_obj == None:
raise Exception('No matching')
if len(matching_obj.groups()) != len(number_pairs):
raise Exception('More unexcepted capturing groups found.')
for nums in zip(matching_obj.groups(), number_pairs):
small_int = nums[1][0]
large_int = nums[1][1]
if float(nums[0]) < small_int or float(nums[0]) > large_int:
raise Exception('No matching')
# successfully matched if we reach here
# examples
# match a number between -1 and 100
# match_number_range('4', '{-1..100}')
# match number range 1 to 4 followed by letter a then number range 2 to 100
# match_number_range('2a5', '{1..4}a{2..100}')
The function match_number_range
replaces any occurrence of {num1..num2}
with a regex which represents any integers within capturing groups, then extracts them and compares the numbers later. However, you should never have any capturing group in the original pattern, otherwise this simple function won’t work.
I came up with this idea when I added the feature to match a number range in the glob engine of EditorConfig C Core.