How can a get 2 Capturing Groups with XXXX and YYYY?
So far I have only partial success:
B4X:
Dim Exp As String = $"<span style="font-weight: bold[^>]+>([^<]+)|<td role="gridcell"[^>]+>([^<]+)"$
Dim m As Matcher
m = Regex.Matcher(Exp, Respuesta)
Do While m.Find
Log("G1=" & m.Group(1))
Log("G2=" & m.Group(2))
Log("*********************")
Loop
Dim deflatedString As String = htmlString.Replace(CRLF,"")
Then try this pattern:
B4X:
Dim pattern As String = "<span.*?>(.*?)</span>.*?<td.*?>(.*?)</td>"
Dim mchr As Matcher = Regex.Matcher(pattern, deflatedString)
Do While mchr.Find
Log("G1=" & mchr.Group(1))
Log("G2=" & mchr.Group(2))
Log("*********************")
Loop
I have only tested this pattern in JavaScript, not B4A, but I don't think the differences in the Regex engines should break this. The trick to this pattern is the question marks. That makes what comes before not greedy.
For example, .*?> will keep eating characters until it hits > even though > should satisfy the .* (the .* is not greedy because of that question mark).
Dim tidy As Tidy
tidy.Initialize
Dim s As String = $"<tr data-ri="0" class="ui-widget-content ui-datatable-even" role="row">
<td role="gridcell"><span style="font-weight: bold;">XXXX:</span></td>
<td role="gridcell" style="text-align:left;">YYYY</td></tr>
<tr data-ri="1" class="ui-widget-content ui-datatable-odd" role="row">
<td role="gridcell"><span style="font-weight: bold;">XXXX:</span></td>
<td role="gridcell" style="text-align:left;">YYYY</td></tr>
<tr data-ri="2" class="ui-widget-content ui-datatable-even" role="row">
<td role="gridcell"><span style="font-weight: bold;">XXXX:</span></td>
<td role="gridcell" style="text-align:left;">YYYY</td></tr>"$
Dim in As InputStream
Dim b() As Byte = s.GetBytes("utf8")
in.InitializeFromBytesArray(b, 0, b.Length)
tidy.Parse(in, File.DirInternal, "1.xml")
Dim xm As Xml2Map
xm.Initialize
Dim m As Map = xm.Parse(File.ReadString(File.DirInternal, "1.xml"))
' Dim jg As JSONGenerator 'convert to a nice string to better understand the
' jg.Initialize(m)
' Log(jg.ToPrettyString(4))
I found your post [B4X] Text, Strings and Parsers, And I discarded jTidy because I'm also porting this to B4i, but as in the post says, is not available to B4i. Is there any similar html parser available now?