Pattern Matching

I've just recently had a task where I had to do table lookups by a unique Code.
It was a pretty special sort of Code though. The first character has special meaning and the last two characters have special meaning.

I had the option of creating a complicated mapping table, but then I recalled my unix days... Regular Expressions!
Regular Expressions are great. They provide syntax for advanced string pattern matching.

In this example I want to find the Code which starts with a digit and ends with the letter J followed by the number 9.
The text in cell A3 is the pattern which matches what I'm looking for. In this case, that pattern matches the Code 9SGJ9.
I could have used ^(S|8)YRUP$ to match 8YRUP (or SYRUP if it were in the list of Codes).
The pattern matching syntax is sophisticated. The documentation is really worth a look.

The formula in cell B3 is:
=INDEX(B6:B15, RegMatch($A$6:$A$15, $A$3))

RegMatch is a User Defined Function in VBA for handling the pattern matching.
Regular Expression functions are already provided by Microsoft's VBScript library so I just wrote a wrapper around it and called that RegMatch.

You'll need to reference "Microsoft VBScript Regular Expressions".
On my machine it's version 5.5, but a later version is available for download from Microsoft's website.

Dim reg As New RegExp
 
Public Function RegMatch(Source As Range, Pattern As String, Optional IgnoreCase As Boolean = True, Optional MultiLine As Boolean = True) As Long
    Dim rng As Range, i As Long, j As Long
 
    reg.IgnoreCase = IgnoreCase
    reg.MultiLine = MultiLine
    reg.Pattern = Pattern
 
    i = 0: j = 0
    For Each rng In Source
        i = i + 1
        If reg.test(rng.Value) Then
            j = i
            Exit For
        End If
    Next
    RegMatch = j
End Function

As I'm writing this post I've been looking on the Internet for other examples.
A great newsgroup post by Harlan Grove offers similar wrappers around the scripting library.

The documentation for VBScript Regular Expressions is available from Microsoft's website. Personally, I prefer the offline version.

18 Comments

  1. That's cool, Rob. Here's another link

    http://www.tmehta.com/regexp/

    which I can only assume is Tushar Mehta.

  2. JWalk says:

    Very nice, Rob. I didn't know about that. I've tried to do similar things using VBA's Like operator, but this is definitely a better choice.

  3. brettdj says:

    Hi Rob,

    I've been mucking about with RegExp for a while and found it useful for a variety of Excel tasks from rearranging peoples names, replacing specific absolute cell references with relative references, to extacting the sheet names from a multi sheet RefEdit box range.

    As I had a number of requests for some examples, I pulled together a small file showing different RegExp methods (Test, Execute & Replace) and uses for RegExp with Excel

    Hopefully this link at VBAX is accessible
    http://www.vbaexpress.com/kb/getarticle.php?kb_id=68

    there is a discussion thread on this example @
    http://www.vbaexpress.com/forum/showthread.php?t=226
    but I think you need to be a VBAX member to see it

    Cheers

    Dave

  4. XL-Dennis says:

    I agree :)

    Here is a link to a workbook that contain 7 examples for regular expression from my friend Dave (aka brettdj):

    http://www.vbaexpress.com/kb/getarticle.php?kb_id=68

    And if You want to test expression or learn more then this free tool - The Regulator - may be of interest:

    http://regex.osherove.com/

    Kind regards,
    Dennis

  5. brettdj says:

    LOL :)

    Dennis, that's the first time I've been browsing through here for a month or more, and we end up posting a minute apart.

    Thanks for the recommendation.

    I hope all is well with you and your family. All is well here with our newborn but I'm travelling a lot for work at the momement

    Cheers

    Dave

  6. XL-Dennis says:

    :)

    I will always associate Regular Expression with You ;)

    Kind regards,
    Dennis

  7. Hi.

    Some great links here - thanks.

    I too have a newborn - well, 7 months... some of the reason I've not been posting recently ;)

    In the above example I used it for Code lookups.

    My other use for RegExp in this particular task was for massaging the output of DOS program.
    The DOS program creates a fixed column width report, divided into sections - with Headers, Footers, Subnotes etc...
    I used Regular expressions to detect and split sections. Then detected headers and footers for removal. Removed lines with a blank code (subnotes).
    Removed empty lines.
    I was finally left with only data in fixed width column ready for transfer.
    In all, about 10 lines of code (each calling a generic RegReplace function)

    I dread to think what the non-RegEx equivalent would have been.

  8. Jon Peltier says:

    "I dread to think what the non-RegEx equivalent would have been."

    I wrote the non-RegEx version of this a few years ago. I was parsing mainframe output in an antique QA application, for which we fortunately had a terminal emulator with a VBA object model (the in-house VB programmer couldn't figure out how to work it, and wouldn't listen to a mere engineer--me--explain it to him). I'd get the output, then line by line search for recognized phrases, then go to the branch of the code that parsed that particular line or section. It would break every second time I ran it, because it would encounter yet another different abbreviation in the output. RegEx would have helped immensely.

  9. Ben says:

    This is a great bit of code for RegExp functions that actually work!!
    Wish i'd seen this a long time ago when i first started my involvment with excel (after a long time working on unix).

  10. doco says:

    Here is an example from Microsoft's site

    Function TestRegExp(myPattern As String, _
                        myString As String) As String
    Dim objRegExp           As RegExp
    Dim objMatch            As Match
    Dim colMatches          As MatchCollection
    Dim RetStr              As String
       
    '   Create a regular expression object.
        Set objRegExp = New RegExp

        With objRegExp
        '   Set the pattern by using the Pattern property.
            .Pattern = myPattern
       
        '   Set Case Insensitivity.
            .IgnoreCase = True
       
        '   Set global applicability.
            objRegExp.Global = True
       
        '   Test whether the String can be compared.
            If (.Test(myString) = True) Then
       
            '   Get the matches.
                Set colMatches = .Execute(myString)     '   Execute search.
         
                For Each objMatch In colMatches         '   Iterate Matches collection.
                    With objMatch
                        RetStr = RetStr & "Match found at position "
                        RetStr = RetStr & .FirstIndex & ". Match Value is '"
                        RetStr = RetStr & .Value & "'." & vbCrLf
                    End With
                Next
            Else
                 RetStr = "String Matching Failed"
            End If
        End With
       
       TestRegExp = RetStr
       
    End Function

  11. doco says:

    That did'nt work well. I think you can read it anyway. I can never remember what the tags are for code on this site. '[]' ?

  12. doco: There's a paragraph above the comments to remind you. It's [ vb ] without the spaces.

  13. flee01 says:

    Another solution is to use iserror(match("*" & yourSearchCriteria & "*",yourCell,0)) as a formula.

    If it returns true your criteria is not found.

  14. Emma says:

    Hi everyone

    Can someone give me some more pointers on how to make the "RegExp" Object available to me in Excel?

    I went to the Microsoft website, and installed "Windows Script 5.7". Is that the right thing? If it is, what else do I have to do to make my VBA macros recognize this object? I tried to initialize both "VBScript_RegExp_57.RegExp" and just "RegExp", but it does not recognize either.

    I'm not at all familiar with VBScript so I might be completely on the wrong track. But browsing the Microsoft website didn't help me much...

    Thanks in advance.
    Emma

  15. Emma: From VBA's menu, Tools > References. Tick "Microsoft VBScript Regular Expressions"

  16. [...] As Boolean = True, Optional MultiLine As Boolean = True) As Boolean ‘ Modified from http://www.dailydoseofexcel.com/archives/2005/08/13/pattern-matching/    Dim reg As New RegExp     reg.IgnoreCase = IgnoreCase    reg.MultiLine = MultiLine  [...]

  17. Ramesh Vasudevan says:

    All:

    I have following question on the parsing and using regular expressions.

    The following information need to be parsed and provide the results as follows:

    Input:
    123 Technology, Inc.
    Access Info, LLC
    First systems, llc
    Level 5, Ltd
    Level 3, LLC

    Expected Output:
    123 Technology
    Access Info
    First systems
    Level 5
    Level 3

    How do I get the results in Excel using regexp or existing formulas.

    Thank You in advance.
    Ramesh V.

  18. PBorralho says:

    Ramesh, the formula that you need is:

    =LEFT(A1,FIND(",",A1)-1)

    (assuming the input text is at cell A1)

Leave a Reply