Android Tutorial [B4X] Regular expressions (RegEx) tutorial

Discussion in 'Tutorials & Examples' started by Erel, Dec 29, 2010.

  1. Erel

    Erel Administrator Staff Member Licensed User

    Regular expressions are very powerful and make complicate parsing challenges much easier.
    This short tutorial will describe the usage of regular expressions in Basic4android.
    If you are not familiar with regular expressions you can find many good tutorials online. I recommend you to start with this one: Regular Expression Tutorial - Learn How to Use Regular Expressions

    Basic4android uses Java regular expression engine. See this page for specific nuances related to this engine: Pattern (Java Platform SE 6)

    Regular expressions methods in Basic4android start with the predefined object named Regex. You can write Regex followed by a dot to see the available methods.

    All methods accept a pattern string. This is the regular expression pattern. Note that internally the compiled patterns are cached. So there is no performance loss when using the same patterns multiple times.

    For each method there are two variants. The difference between the variants is that the second one receives an 'options' integer that affects the engine behavior. For now there are two option, CASE_INSENSITIVE and MULTILINE. CASE_INSENSITIVE makes the pattern matching be case insensitive. MULTILINE changes the string anchors ^ and & match the beginning and end of each line instead of the whole string.
    Both options can be combined by calling Bit.Or(Regex.MULTILINE, Regex.CASE_INSENSITIVE).

    Matching the whole string
    IsMatch and IsMatch2 are good to validate user input. The result of these methods is true if the whole string matches the pattern.
    For example the following code checks if a date string is formatted in a format similar to: 12-31-2010
    Log(Regex.IsMatch("\d\d-\d\d-\d\d\d\d""11-15-2010")) 'True
        Log(Regex.IsMatch("\d\d-\d\d-\d\d\d\d""12\31\2010")) 'False
    This pattern will also match the string "99-99-9999".

    Splitting text
    Split and Split2 splits a text around matches of the given pattern.
    Simple case:
    Dim data As String
    data = 
    Dim numbers() As String
    numbers = 
    Regex.Split(",", data)
    Dim l As List
    Lists can be easily printed with Log so we add the array to the list.
    The result is:


    The comma followed by a single space is part of the list formatting. The expected values were parsed.

    Now if the data value was "123, 432 , 13 , 4 , 12, 534"
    The result wasn't perfect:


    There are extra spaces which are part of the parsed values.

    We can change the pattern to match a comma or white space:
    numbers = Regex.Split("[,\s]", data)
    The result is still not as we want it:


    Many empty strings were added.
    The correct pattern in this case is:
    numbers = Regex.Split("[,\s]+", data)
    Find matches in string
    Here we have a long string and we want to find all matches of a pattern in the string. We can also use capture groups to get specific parts of the match.

    As an example we will find and print email addresses in text:
    Dim data As String
    data = 
    "Please contact or"
    Dim matcher1 As Matcher
    matcher1 = 
    Regex.Matcher("\w+@\w+\.\w+", data)
    Do While matcher1.Find = True
    This code prints:

    Note that this pattern is far from being a good pattern for email validation / matching.

    In the second example we will use a Matcher with capturing groups to validate a date text. The pattern is similar to the pattern in the first example with the addition of parenthesis. These parenthesis mark the groups:
    Log(IsValidDate("13-31-1212")) 'false
    Log(IsValidDate("12-31-1212")) 'true

    Sub IsValidDate(Date As StringAs Boolean
    Dim matcher1 As Matcher
        matcher1 = 
    Regex.Matcher("(\d\d)-(\d\d)-(\d\d\d\d)", Date)
    If matcher1.Find = True Then
    Dim days, months As Int
            months = matcher1.Group(
    1'fetch the first captured group.
            days = matcher1.Group(2'fetch the second captured group
            If months > 12 Then Return False
    If days > 31 Then Return False
    Return True
    Return False
    End If
    End Sub
    The groups feature is very useful. If you find yourself calling String.IndexOf together with String.Substring multiple times, it is a good hint that you should move to a Regex and Matcher.

    Online tool to test Regex patterns:
    Last edited: Mar 24, 2014
    kaktus likes this.
  2. WZSun

    WZSun Member Licensed User

    Hi Erel,
    Thanks for the insight... it sure does help inspired me to think harder..

    Below is a quick StringParse sub that I did to retrieve a sample date

    s = "12/31/2010"
    s1 = StrParse(s,"/",2)
    msgbox(s1,"Info") ' returns 2010

    Sub StrParse(FirstStr As String, sSeparator As String, idx As Int) As String
    Dim strArray() As String, l As List
    strArray = Regex.Split("[" & sSeparator & "\s]", FirstStr)
    Return l.Get(idx)
    End Sub

  3. devjet

    devjet Member Licensed User

  4. Foz

    Foz Member Licensed User

    I think I'm missing something here...

    If I do a Split, into a dynamic string array, how do I then get the resulting array size?
  5. Erel

    Erel Administrator Staff Member Licensed User

    Dim arr() As String
    arr = 
    For i = 0 To arr.Length - 1
  6. Foz

    Foz Member Licensed User

    Thank you Erel!

    sigh... I was doing an inline assign which you obviously can't do, and it didn't like it and thus never showed the Length field and wouldn't compile.

    One of these days I'll get my head screwed on straight...
  7. ChrShe

    ChrShe Member Licensed User

    Some quick Regex.Matcher help...

    Good day!

    I've been tinkering around with the Regex.Matcher and have run into a bit of a snag that I was hoping I could get some help with.

    I'm parsing a web page with the following lines:

    What I need to get is the InnerText of each div line. So, for example, for the Line-animal-id, I'd like to have "13069119" returned.

    Using the following, I've been able to get the matcher to find the line, but can't seem to figure out returning the portion of the line that I'm interested in.
    So, basically, how do I get the matcher to return the portion of the found line that I want?

    Any help is greatly appreciated.
    THANK YOU!!!
  8. Erel

    Erel Administrator Staff Member Licensed User

    It is better to start a new thread for such questions.

    If the string is a valid XML (XHTML) then you can use an XML parser to parse it.

    With Regex you need something like:
    "class=\""([^""]+)\"">([^>]+)</div>" 'group 1 will hold the class attribute and group 2 the text.
  9. LucaMs

    LucaMs Expert Licensed User

    I found :)

    When I met regular expressions, I quickly abandoned them.
    I thought: "Too much time to learn them, I hurry faster with string functions."

    This your last sentence makes me think, though.

    Am I wrong or they could be very useful for creating a command line parser and for break (split, group or grrrr) HTML blocks/Tags?
  10. Erel

    Erel Administrator Staff Member Licensed User

    The recommended way to parse HTML is with the jTIDY library.
    mehr+shad likes this.
  11. Alberto Michelis

    Alberto Michelis Active Member Licensed User

    How to check only alphabetical chars and spaces?
    Alberto Michelis OK
    Alberto,Michelis Wrong
    Alberto2Michelis Wrong
  12. Erel

    Erel Administrator Staff Member Licensed User

    Please start a new thread for this question.
  13. Sandman

    Sandman Well-Known Member Licensed User

  14. MaFu

    MaFu Well-Known Member Licensed User

  15. Erel

    Erel Administrator Staff Member Licensed User

  16. victormedranop

    victormedranop Well-Known Member Licensed User

    I need to parse this string "Resultado : Q;11#1;P;12#1;T;13#23;Q;14#2;Q;21#2;P;22#2;T;23#3;Q;31#3;P;32#3;T;33#9;SP;34#10;SP;35#6;Q;41#6;P;42#6;T;43#12;SP;44#11;SP;45#7;Q;51#7;P;52#7;T;53#13;SP;54#14;SP;55#20;Q;61#20;P;62#20;T;63#21;Q;71#21;P;72#21;T;73#22;C;81

    the result should be


    any help will be appreciated.

  17. MaFu

    MaFu Well-Known Member Licensed User

    This regex pattern should work:
  18. Erel

    Erel Administrator Staff Member Licensed User

    This is not the correct place to post such questions. Always start a new thread for your question.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice