B4J Question RegEx pattern help needed

notedop

Member
Licensed User
Longtime User
Hi,

I need to to retrieve specific values from a HTML page. I've tried the jtidy solution and convert html to xml and then parse it from there but the HTML page contains errors and i;m not in the position to fix these errors. That leaves me with string functions or RegEx.
I've created a string function to work around it for now but I really want to understand RegEx better as I believe this can be handy in future usage as well. I've tried to go through the documentation but to be honoust I can't really get my head around it.

All tries on the https://b4x.com:51041/regex_ws/index.html page have failed and im only getting either single character values or nothing at all

I'm trying to get the Name, ID and Value.

B4X:
  <input type="hidden" name="ctl00$hdnUIDs" id="hdnUIDs" value="SAo7MZE2AeF92slcapwhmEqC2dTCiSTQKb9zgaHl6RzK01yVin0YVpchedG4L7Txt816xTaKLf2/CQ45Qk+1sQ==" />
  <input type="hidden" name="ctl00$hdnSECs" id="hdnSECs" value="i/sXDoYis+NARUc6I9W61w==" />
  <input type="hidden" name="ctl00$hdnNS" id="hdnNS" />

I've created following string function to get the value based on the Name, this is working now but could easily break if website formatting changes and i'd like to understand regex better.
Note that I'm not taking the ID into account in below logic.

Can someone help me with generating the pattern for acquiring the name, id and value and explain each piece of the pattern so I can use it as learing curve?

call with:

B4X:
GetElementValue(html,"ctl00$hs")

B4X:
Sub GetElementValue(s As String, name As String) As String
  
    s= s.SubString2(s.IndexOf(name),s.Length)
    s = s.SubString2(s.IndexOf("value="& QUOTE), s.IndexOf(QUOTE & " />"))
    s= s.Replace("value="&QUOTE,"")
    Return s
  
End Sub
 

EnriqueGonzalez

Well-Known Member
Licensed User
Longtime User
Hi

First of all, I am very sorry that i can't help you with regex.

I think that your best bet is the library JSOUP.
https://www.b4x.com/android/forum/threads/jsoup-html-parser.49152/
(it works with b4j)

According to their webpage
https://jsoup.org/

jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.

Now. The b4a library is not fully complete yet I worked with it some weeks ago with full java object and it was a piece of cake
 
Upvote 0

notedop

Member
Licensed User
Longtime User
Thanks Enrique, i've seen that page as well but didn't want to utilize it as it seems to be unsupported for quite some time.
I might give it a go if I need to do some additional parsing, following your 'piece of cake' comment :p

Any help with the regex pattern is still appreciated by anyone else:)
 
Upvote 0
Top