B4J Question RegEx pattern help needed

notedop · Jun 17, 2017

Hi,

I need to to retrieve specific values from a HTML page. I've tried the jtidy solution and convert html to xml and then parse it from there but the HTML page contains errors and i;m not in the position to fix these errors. That leaves me with string functions or RegEx.
I've created a string function to work around it for now but I really want to understand RegEx better as I believe this can be handy in future usage as well. I've tried to go through the documentation but to be honoust I can't really get my head around it.

All tries on the https://b4x.com:51041/regex_ws/index.html page have failed and im only getting either single character values or nothing at all

I'm trying to get the Name, ID and Value.

B4X:

  <input type="hidden" name="ctl00$hdnUIDs" id="hdnUIDs" value="SAo7MZE2AeF92slcapwhmEqC2dTCiSTQKb9zgaHl6RzK01yVin0YVpchedG4L7Txt816xTaKLf2/CQ45Qk+1sQ==" />
  <input type="hidden" name="ctl00$hdnSECs" id="hdnSECs" value="i/sXDoYis+NARUc6I9W61w==" />
  <input type="hidden" name="ctl00$hdnNS" id="hdnNS" />

I've created following string function to get the value based on the Name, this is working now but could easily break if website formatting changes and i'd like to understand regex better.
Note that I'm not taking the ID into account in below logic.

Can someone help me with generating the pattern for acquiring the name, id and value and explain each piece of the pattern so I can use it as learing curve?

call with:

B4X:

GetElementValue(html,"ctl00$hs")

B4X:

Sub GetElementValue(s As String, name As String) As String
  
    s= s.SubString2(s.IndexOf(name),s.Length)
    s = s.SubString2(s.IndexOf("value="& QUOTE), s.IndexOf(QUOTE & " />"))
    s= s.Replace("value="&QUOTE,"")
    Return s
  
End Sub

EnriqueGonzalez · Jun 17, 2017

Hi

First of all, I am very sorry that i can't help you with regex.

I think that your best bet is the library JSOUP.
https://www.b4x.com/android/forum/threads/jsoup-html-parser.49152/
(it works with b4j)

According to their webpage
https://jsoup.org/

jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.

Now. The b4a library is not fully complete yet I worked with it some weeks ago with full java object and it was a piece of cake

notedop · Jun 17, 2017

Thanks Enrique, i've seen that page as well but didn't want to utilize it as it seems to be unsupported for quite some time.
I might give it a go if I need to do some additional parsing, following your 'piece of cake' comment

Any help with the regex pattern is still appreciated by anyone else

EnriqueGonzalez · Jun 17, 2017

Good luck!

Unsupported? Last version was from 5 days ago

https://jsoup.org/news/release-1.10.3

notedop · Jun 17, 2017

Enrique Gonzalez R said:
Unsupported? Last version was from 5 days ago

I meant it was wrapped last time to B4J in 2015. I didn't realize that I could download the latest JAR, update the XML of library to point to the latest JAR and get it going.

B4J Question RegEx pattern help needed

notedop

Member

EnriqueGonzalez

Expert

notedop

Member

EnriqueGonzalez

Expert

notedop

Member

Similar Threads