Android Tutorial [B4X] Regular expressions (RegEx) tutorial

Discussion in 'Tutorials & Examples' started by Erel, Dec 29, 2010.

  1. Erel

    Erel Administrator Staff Member Licensed User

    Regular expressions are very powerful and make complicate parsing challenges much easier.
    This short tutorial will describe the usage of regular expressions in Basic4android.
    If you are not familiar with regular expressions you can find many good tutorials online. I recommend you to start with this one: Regular Expression Tutorial - Learn How to Use Regular Expressions

    Basic4android uses Java regular expression engine. See this page for specific nuances related to this engine: Pattern (Java Platform SE 6)

    Regular expressions methods in Basic4android start with the predefined object named Regex. You can write Regex followed by a dot to see the available methods.

    All methods accept a pattern string. This is the regular expression pattern. Note that internally the compiled patterns are cached. So there is no performance loss when using the same patterns multiple times.

    For each method there are two variants. The difference between the variants is that the second one receives an 'options' integer that affects the engine behavior. For now there are two option, CASE_INSENSITIVE and MULTILINE. CASE_INSENSITIVE makes the pattern matching be case insensitive. MULTILINE changes the string anchors ^ and & match the beginning and end of each line instead of the whole string.
    Both options can be combined by calling Bit.Or(Regex.MULTILINE, Regex.CASE_INSENSITIVE).

    Matching the whole string
    IsMatch and IsMatch2 are good to validate user input. The result of these methods is true if the whole string matches the pattern.
    For example the following code checks if a date string is formatted in a format similar to: 12-31-2010
    Code:
    Log(Regex.IsMatch("\d\d-\d\d-\d\d\d\d""11-15-2010")) 'True
        Log(Regex.IsMatch("\d\d-\d\d-\d\d\d\d""12\31\2010")) 'False
    This pattern will also match the string "99-99-9999".

    Splitting text
    Split and Split2 splits a text around matches of the given pattern.
    Simple case:
    Code:
    Dim data As String
    data = 
    "123,432,13,4,12,534"
    Dim numbers() As String
    numbers = 
    Regex.Split(",", data)
    Dim l As List
    l.Initialize2(numbers)
    Log(l)
    Lists can be easily printed with Log so we add the array to the list.
    The result is:

    [​IMG]

    The comma followed by a single space is part of the list formatting. The expected values were parsed.

    Now if the data value was "123, 432 , 13 , 4 , 12, 534"
    The result wasn't perfect:

    [​IMG]

    There are extra spaces which are part of the parsed values.

    We can change the pattern to match a comma or white space:
    Code:
    numbers = Regex.Split("[,\s]", data)
    The result is still not as we want it:

    [​IMG]

    Many empty strings were added.
    The correct pattern in this case is:
    Code:
    numbers = Regex.Split("[,\s]+", data)
    Find matches in string
    Here we have a long string and we want to find all matches of a pattern in the string. We can also use capture groups to get specific parts of the match.

    As an example we will find and print email addresses in text:
    Code:
    Dim data As String
    data = 
    "Please contact mike@gmail.com or john@gmail.com"
    Dim matcher1 As Matcher
    matcher1 = 
    Regex.Matcher("\w+@\w+\.\w+", data)
    Do While matcher1.Find = True
        
    Log(matcher1.Match)
    Loop
    This code prints:
    mike@gmail.com
    john@gmail.com

    Note that this pattern is far from being a good pattern for email validation / matching.

    In the second example we will use a Matcher with capturing groups to validate a date text. The pattern is similar to the pattern in the first example with the addition of parenthesis. These parenthesis mark the groups:
    Code:
    Log(IsValidDate("13-31-1212")) 'false
    Log(IsValidDate("12-31-1212")) 'true

    Sub IsValidDate(Date As StringAs Boolean
        
    Dim matcher1 As Matcher
        matcher1 = 
    Regex.Matcher("(\d\d)-(\d\d)-(\d\d\d\d)", Date)
        
    If matcher1.Find = True Then
            
    Dim days, months As Int
            months = matcher1.Group(
    1'fetch the first captured group.
            days = matcher1.Group(2'fetch the second captured group
            If months > 12 Then Return False
            
    If days > 31 Then Return False
            
    Return True
        
    Else
            
    Return False
        
    End If
    End Sub
    The groups feature is very useful. If you find yourself calling String.IndexOf together with String.Substring multiple times, it is a good hint that you should move to a Regex and Matcher.

    Online tool to test Regex patterns: http://www.basic4ppc.com/android/forum/threads/server-regex-tool.39192/
     
    Last edited: Mar 24, 2014
    kaktus likes this.
  2. WZSun

    WZSun Member Licensed User

    Hi Erel,
    Thanks for the insight... it sure does help inspired me to think harder..


    Below is a quick StringParse sub that I did to retrieve a sample date




    s = "12/31/2010"
    s1 = StrParse(s,"/",2)
    msgbox(s1,"Info") ' returns 2010


    Sub StrParse(FirstStr As String, sSeparator As String, idx As Int) As String
    Dim strArray() As String, l As List
    strArray = Regex.Split("[" & sSeparator & "\s]", FirstStr)
    l.Initialize2(strArray)
    Return l.Get(idx)
    End Sub



    Rgds
    WZSun
     
  3. devjet

    devjet Member Licensed User

  4. Foz

    Foz Member Licensed User

    I think I'm missing something here...

    If I do a Split, into a dynamic string array, how do I then get the resulting array size?
     
  5. Erel

    Erel Administrator Staff Member Licensed User

    Code:
    Dim arr() As String
    arr = 
    Regex.Split(...)
    For i = 0 To arr.Length - 1
     
    Log(arr(i))
    Next
     
  6. Foz

    Foz Member Licensed User

    :sign0161:
    Thank you Erel!

    sigh... I was doing an inline assign which you obviously can't do, and it didn't like it and thus never showed the Length field and wouldn't compile.

    One of these days I'll get my head screwed on straight...
     
  7. ChrShe

    ChrShe Member Licensed User

    Some quick Regex.Matcher help...

    Good day!

    I've been tinkering around with the Regex.Matcher and have run into a bit of a snag that I was hoping I could get some help with.

    I'm parsing a web page with the following lines:

    What I need to get is the InnerText of each div line. So, for example, for the Line-animal-id, I'd like to have "13069119" returned.

    Using the following, I've been able to get the matcher to find the line, but can't seem to figure out returning the portion of the line that I'm interested in.
    Code:
    Regex.Matcher("class=\""list-animal-name\""",page)
    So, basically, how do I get the matcher to return the portion of the found line that I want?

    Any help is greatly appreciated.
    THANK YOU!!!
    ~Chris
     
  8. Erel

    Erel Administrator Staff Member Licensed User

    It is better to start a new thread for such questions.

    If the string is a valid XML (XHTML) then you can use an XML parser to parse it.

    With Regex you need something like:
    Code:
    "class=\""([^""]+)\"">([^>]+)</div>" 'group 1 will hold the class attribute and group 2 the text.
     
  9. LucaMs

    LucaMs Expert Licensed User


    I found :)

    When I met regular expressions, I quickly abandoned them.
    I thought: "Too much time to learn them, I hurry faster with string functions."

    This your last sentence makes me think, though.

    Am I wrong or they could be very useful for creating a command line parser and for break (split, group or grrrr) HTML blocks/Tags?
     
  10. Erel

    Erel Administrator Staff Member Licensed User

    The recommended way to parse HTML is with the jTIDY library.
     
    mehr+shad likes this.
  11. Alberto Michelis

    Alberto Michelis Active Member Licensed User

    How to check only alphabetical chars and spaces?
    Alberto Michelis OK
    Alberto,Michelis Wrong
    Alberto2Michelis Wrong
    Thanks
     
  12. Erel

    Erel Administrator Staff Member Licensed User

    Please start a new thread for this question.
     
  13. Sandman

    Sandman Active Member Licensed User

  14. MaFu

    MaFu Well-Known Member Licensed User

  15. Erel

    Erel Administrator Staff Member Licensed User

Loading...