Matching an Arbitrary Number Range in Regular Expression Using Capturing Groups

To match an arbitrary number range in a regular expression, is to match a string with a specific pattern which contains a number range. For example, to match with a pattern test{0..100}, where {0..100} denotes an integer not smaller than 0 and not larger 100, is such a case.

Although we already have some solutions to match a number range in regular expression, but they all make lengthy string and may lead to possible performance issue. In this post, I will provide an alternative solution to this problem, but it requires you to have control over the captured groups of the regex.

The basic idea is very simple: replace anywhere in the string where you need to match a number range with a capturing group which matches any numbers you are interested (e.g. integers, real numbers, etc.), and test whether they are in the number range later. A python example is shown below. The code snippet matches a{number}b, where {number} is an integer within a the range of small_int and large_int.

import re

matched = True

integer_regex = r'[\+\-]?[0-9]+'

matching_obj = re.match('a(' + integer_regex + ')b', string)

if matching_obj == None:
    raise Exception('No matching')

for num in matching_obj.groups():
    if float(num) < small_int or float(num) > large_int:
        raise Exception('No matching')

# successfully matched if we reach here

However, the code above is a bit stiff: each time you need to match a new string you have to repeat all the code. Another more flexible version is shown below (for Python >= 3.4):

def match_number_range(string, pattern):
    """
    Matches a string with pattern. The pattern can contain {num1..num2} to express the a range of numbers. The match is
    successful if no exception is raised. Note the code won't work if you have any capturing groups in it.

    string: the string to match
    pattern: the pattern to be matched
    """

    import re

    integer_regex = r'[\+\-]?[0-9]+'

    # extract all the number pairs (number ranges)
    number_pairs = list(map(lambda x: (float(x[0]), float(x[1])), re.findall(r'(?<!\\){{({})..({})}}'.format(integer_regex, integer_regex), pattern)))

    # replace all occurence of {num1..num2} by integer numbers
    new_pattern = re.sub(r'(?<!\\){{{}..{}}}'.format(integer_regex, integer_regex), r'({})'.format(integer_regex), pattern)

    matching_obj = re.fullmatch(new_pattern, string)

    if matching_obj == None:
        raise Exception('No matching')

    if len(matching_obj.groups()) != len(number_pairs):
        raise Exception('More unexcepted capturing groups found.')

    for nums in zip(matching_obj.groups(), number_pairs):
        small_int = nums[1][0]
        large_int = nums[1][1]
        if float(nums[0]) < small_int or float(nums[0]) > large_int:
            raise Exception('No matching')

    # successfully matched if we reach here

# examples
# match a number between -1 and 100
# match_number_range('4', '{-1..100}')
# match number range 1 to 4 followed by letter a then number range 2 to 100
# match_number_range('2a5', '{1..4}a{2..100}')

The function match_number_range replaces any occurrence of {num1..num2} with a regex which represents any integers within capturing groups, then extracts them and compares the numbers later. However, you should never have any capturing group in the original pattern, otherwise this simple function won’t work.

I came up with this idea when I added the feature to match a number range in the glob engine of EditorConfig C Core.

Leave a Reply

Your email address will not be published.