Android Question Regex.IsMatch help please

agraham

Expert
Licensed User
I am not very good at regular expressions but I am trying to add them to the search capabilities of my UK mapping program on both desktop (Basic4ppc/C#) and Android(B4A) versions. I have managed on the desktop but am struggling on Android because Regex.IsMatch in Java seems behave differently ito that in .NET, and I also think that I don't really understand how matching in regular expressions works. I've just bought the 'bible', "Mastering Regular Expressions" by Jeffrey Friedl but I am still presently lost. :(

I am matching lines of location data like "Upper Deal,,Kent.TR365515,Village/Settlement"

If I want to select lines containing 'deel' or 'deal' then I want to use the pattern "de[ea]l"

On the desktop with the B4ppc Regex library I just do this and it seems to work fine
Regex1.New1("de[ea]l")
If Regex1.IsMatch(lowercaselineofdata) Then
ArrayList1.Add(lineofdata)
End If


This doesn't work on Android as it seems IsMatch behaves differently to on the desktop
If Regex.IsMatch("de[ea]l", lowercaselineofdata) Then
ResultList.Add(lineofdata)
End If


Can anyone put me out of my misery as to how to achieve the same result as on the desktop.
 

OliverA

Expert
Licensed User
try
B4X:
If Regex.IsMatch(".*de[ea]l.*", lowercaselineofdata) Then 
ResultList.Add(lineofdata)
End If
 
Upvote 0

sorex

Expert
Licensed User
just wondering... why didn't you use .contains("deel") or .contains("deal") ?

anyway...

this seems to work fine on B4J & B4A

B4X:
    Dim m As Matcher
    Dim lineofdata As String
    Dim arrayList1 As List
    Dim mylines As String=$"line1 deal dfsdfdsdf
line2 deel dfsdfdsdf
line3 deal dfsdfdsdf"$

    arrayList1.Initialize
    Dim lines() As String=Regex.Split(CRLF,mylines.ToLowerCase)

    For x=0 To lines.Length-1
        lineofdata=lines(x)
        If Regex.IsMatch(".*de[ea]l.*",lineofdata) Then
            arrayList1.Add(lineofdata)
        End If

    Next

    Log(arrayList1)

B4J log > (ArrayList) [line1 deal dfsdfdsdf, line2 deel dfsdfdsdf, line3 deal dfsdfdsdf]
B4A log > (ArrayList) [line1 deal dfsdfdsdf, line2 deel dfsdfdsdf, line3 deal dfsdfdsdf]
 
Upvote 0

sorex

Expert
Licensed User
this is how I would do it...

B4X:
    Dim m As Matcher
   Dim arrayList1 As List
   Dim mylines As String=$"line1 deal dfsdfdsdf
line2 deel dfsdfdsdf
line3 deal dfsdfdsdf"$
   arrayList1.Initialize
   m=Regex.Matcher(".*de[ea]l.*",mylines)
   Do While m.Find
           arrayList1.Add(m.Group(0))
   Loop
   Log(arrayList1)
 
Upvote 0

agraham

Expert
Licensed User
just wondering... why didn't you use .contains("deel") or .contains("deal") ?
Two reasons. First I can't hard code it as the search patterns will be user entered at runtime and second it's a simple test to see if it works - not a real world requirement.

Thanks for the suggestions. I'll play again tomorrow, but I suspect a better way might be to use a matcher, once I understand it!, and avoid the whole line matching necessity. The application has other simpler search modes which look for text fragments that start a line or text fragments contained within a line so I'd like to keep the matching to line substrings and not have to enter a whole line search pattern to stay consistent with these other modes.
 
Upvote 0

sorex

Expert
Licensed User
in a lot of cases some pre fixing with .replace makes regex a lot easier especially when your data is spread over multiple lines or has a lot of single/double quotes or brackets.
 
Upvote 0

agraham

Expert
Licensed User
in a lot of cases some pre fixing with .replace makes regex a lot easier especially when your data is spread over multiple lines or has a lot of single/double quotes or brackets
Doesn't apply here, it's all single lines and alphanumeric comma separated data.
 
Upvote 0

drgottjr

Well-Known Member
Licensed User
To respond to part of your post,

Regex.isMatch() in B4A matches a string, not an expression. The expression is supplied in the guise of a string following certain "rules" (hence, regular), but it is evaluated differently than a string of characters. In the same way an array of bytes is evaluated differently depending on what's expected as the payload.


By default, Regex works on lines of text. What it understands by "lines" is key. Basically, it's what we think of as a line of text, but there's a lot more going on (which you will learn).

You'll want Regex.Matcher() to de[ea]l with expressions. Then you test for success and, optionally, perform substitutions. If you try to go right to success (eg. if Regex.Matcher("pattern","string").Find), you potentially lose the ability to carry out some substitution. But you would be able to report success, if nothing else.

The Regex.Matcher("pattern","string") method, captures and stores whatever matches there may have been. In your example, matching "deel" or "deal" are 2 different matches, which could result in 2 different substitutions. To do that, you have to use Regex.Matcher() and loop your way through the string,
performing the appropriate substitution as necessary, based on each so-called "group" matched. The Matcher keeps track of these groups.
 
Upvote 0

drgottjr

Well-Known Member
Licensed User
unless you will be undertaking regex missionary work, the bible may be overkill.

in general, the user does not enter a pattern, she enters a string which is then matched against a pattern the app is looking to match. if the input string is the actual pattern, then you don't need regex. you simply pass your pattern to the database with a "like" clause. sqlite would be perfect. comes with its own sort of regex that requires minimal processing and handling.

if your database is one giant flat file, you need to comb through it line by line each time there is a search for something. regex.matcher(snippet, line_of_text) is what you use in a do/while loop. the matcher keeps track of the hits, and you simply add them to a list of hits as you step through. this is a brute force method, which is why the database suggestion is better.

if looping through the database line by line is too primitive, it will be suggested you could load the entire flat file into memory, but the search works differently, given the way regex thinks of lines and how it decides when it has found a match. if you don't do things correctly, you could end up returning a large part of your database as a "hit" for a simple search. some variants of regex are what is known as "greedy". you have to formulate your pattern to take that into account. you don't need the bible, but you do need to know how to do it. technically, a line by line search can have a similar problem, but the damage is staunched at the end of each line since that's all the regex engine is looking at.
 
Upvote 0

drgottjr

Well-Known Member
Licensed User
actually, you don't need regex at all.
if your database is a big flat file, you can use B4A's string.contains(). you could read in your database either line by line or at once into an array or list.
then:
pseudo
B4X:
do while not eof
   if this_line.contains( user_input_snippet ) then
      hitlist.add( this_line )
loop
OR
B4X:
for i = 0 to array.length - 1
   if array(i).contains( snippet ) then
      hitlist.add( this_line )
   next

if the search always involved a street name, then sorting the database by street name would avoid having to read the file beginning
to end each time. (at some point your search is not alphabetically possible.) of course, a keyed database does all that for you, plus
still allows searching based on a handful of characters.
 
Upvote 0

Erel

Administrator
Staff member
Licensed User
B4X:
Dim s As String = $"line1 deal dfsdfdsdf
line2 deel dfsdfdsdf
line222 jkfwelfjkwelf
line 43434 jewrklfejrgkl
line3 deal dfsdfdsdf"$
Dim m As Matcher = Regex.Matcher2("^.*de[ea]l.*$", Bit.Or(Regex.MULTILINE, Regex.CASE_INSENSITIVE), s)
Do While m.Find
   Log(m.Match)
Loop

1. IsMatch is only useful if you want to test whether the complete text matches a specific pattern.

2. The MULTILINE flag is required because we want the start and end anchors to match each line instead of the complete text.
 
Upvote 0

agraham

Expert
Licensed User
Thanks Sorex and Erel. While I don't yet understand what is happening the following seems to do what I want
B4X:
Dim exp As String = "de[ea]l" ' will be edtSearch.Text
Dim MatchCount As Int = 0
For i = 0 To Data.Size - 1          
    Dim Line As String = Data.Get(i)
    Dim m As Matcher = Regex.Matcher2(exp, Regex.CASE_INSENSITIVE, Line)
    If m.Find Then
        MatchCount = MatchCount + 1
        ResultList.Add(Line)
        If MatchCount >= 100 Then
            Msgbox2Async("More than 100 matches found. Terminating search early.", "Search", "OK", "", "", Null, True)
            Exit
        End if
    End If
Next
 
Upvote 0

drgottjr

Well-Known Member
Licensed User
I now regret asking this question in the first place. Don't tell me I don't need regex. I know what I want to achieve. With my history on this site do you really think that I cannot write string comparison code? Words fail me!

i sincerely hope this is not another pout; we are all aware of and grateful for your contributions, and we missed you. i apologi[sz]e if my use of "you" was misintepreted. i wasn't speaking in the royal you. part of the thread, which part has mysteriously been deleted, used "123 smi" as a search. i was suggesting that where no regular expression is used, no regex is needed. perhaps that's why it was deleted, leaving my comments about something which was no longer there. it's also odd that others receive a like for telling you something i explained to you previously. still not too late.
 
Upvote 0

emexes

Expert
Licensed User
which part has mysteriously been deleted
Yeah, that was me. I misunderstood "not a real world requirement" to mean that discussion was not limited to regex approaches to the search problem, and veered off in a direction that has worked for me over the past 30 years. Figured I'd better unclutter the thread. I am still kinda itching to discuss the cost-benefit of regex for this use case, but... I think I have muddied the waters enough already ;-)
 
Upvote 0

OliverA

Expert
Licensed User
IsMatch is only useful if you want to test whether the complete text matches a specific pattern.
Yes, and ".*de[ea]l.*" does just that for this case
The MULTILINE flag is required because we want the start and end anchors to match each line instead of the complete text.
To me that did not seem to be a requirement (original code looked like it was processing one line at a time)
B4X:
Dim exp As String = "de[ea]l" ' will be edtSearch.Text
'Previous posting mentioned something about beginning fragments, so here is a fictional checkbox
If ckAtBeginning.Checked then
   exp = $"^${exp}.*"$ ' Regular expression to find fragment at beginning
Else If ckAtEnd.Checked Then ' Just to show if one wants to find fragment at end 
   exp = $".*${exp}$"$  ' find it at the end
Else
   exp = $".*${exp}.*"$ ' or anywhere
End If
'Note to the above. Not modifying exp would allow for searching for an exact match for a given regular expression. Looks like
'in this case that may not make sense, since each line is very long, but may in other cases (where the line content is short).
Dim MatchCount As Int = 0
For i = 0 To Data.Size - 1          
    Dim Line As String = Data.Get(i)
    If Regex.IsMatch2(exp, Regex.CASE_INSENSITIVE, Line) Then
        MatchCount = MatchCount + 1
        ResultList.Add(Line)
        If MatchCount >= 100 Then
            Msgbox2Async("More than 100 matches found. Terminating search early.", "Search", "OK", "", "", Null, True)
            Exit
        End if
    End If
Next
Please note that code will not run as posted since it contains fictional check boxes. Come to think of it, probably should be a radio group or a drop down box.
 
Upvote 0

agraham

Expert
Licensed User
I am border-line high-functional autistic and while very good at procedural logic my mind goes blank when I try to visualise the outcome of declarative style things like HTML, XML, XAML, Regex etc. With the help of Friedl I am starting to be able to parse Regexes procedurally and so see what is happening which I couldn't before.

I am searching for the regex as entered by the user, if they want start or end matches they can explicitly enter them in the search term. The point of all this is to search tens of thousands of lines of UK place, road and postcode data to find, say, all the hills in Cheshire. A single match to the search term is enough to add the data line to the required results.
 
Upvote 0

drgottjr

Well-Known Member
Licensed User
Yeah, that was me. I misunderstood "not a real world requirement" to mean that discussion was not limited to regex approaches to the search problem, and veered off in a direction that has worked for me over the past 30 years. Figured I'd better unclutter the thread. I am still kinda itching to discuss the cost-benefit of regex for this use case, but... I think I have muddied the waters enough already ;-)
you and me both. our friend isn't the only asberger's roaming the forum; no harm, no foul
 
Upvote 0
Top