Android Question MiniHtmlParser replace <br>

angel

Member
Licensed User
Longtime User
Hello

I am using MiniHtmlParser and inside <td> text1 <br> text2 </td>

Now it only retrieves the value text1 and ignores the other value.

How can I replace the <br> tag with a space? and retrieve text1 text2

<table class="table" style="text-align:left;">
<tr>
<td class="calendarSeparator" style="text-align: left; border-bottom: 2px solid #000;font-size: 15px;" colspan="6">
Ronda 1, Jornada 1 </td>
</tr>
<tr>
<th>a1</th>
<th>a2</th>
<th>a3</th>
<th>a4</th>
<th>a5</th>
<th>a6</th>
</tr>
<tr class="showFullMatch">
<td style="text-align:left;">
<a href="https://www.google.es/maps/dir//42.0766,1.82096/@42.0766,1.82096,16z?hl=ca" class="btn btn btn-primary" target="new">
<i class="fa fa-map-marker"></i>
</a>
Avià
</td>
<td> UEN <br> BCN </td>
<td class="result" style="text-align:center;">0 - 0</td>

Sub Parsetest

Dim j As HttpJob
j.Initialize("",Me)
'j.Download(myurl)


Wait For (j) JobDone(j As HttpJob)
If j.Success Then
Dim parsermini As MiniHtmlParser
parsermini.Initialize
Dim Root2 As HtmlNode = parsermini.Parse(j.GetString)
Dim table As HtmlNode = parsermini.FindNode(Root2, "table", parsermini.CreateHtmlAttribute("class", "table"))
Dim tbody As HtmlNode = parsermini.FindNode(table, "table", Null)

For Each tr As HtmlNode In parsermini.FindDirectNodes(tbody, "tr", Null)
Dim counter As Int
Dim counter2 As Int
For Each td As HtmlNode In parsermini.FindDirectNodes(tr, "td", Null)
counter = counter + 1
If counter = 1 Then
Dim a As HtmlNode = td.Children.Get(0)

End If

Log(parsermini.GetTextFromNode(td, 0))

Next

Next

End If
j.Release

Thank
 

DonManfred

Expert
Licensed User
Longtime User
Upvote 0

emexes

Expert
Licensed User
Starting to look like a job for regex. ?

B4X:
Dim StringWithHtmlBreaks As String = "one <br> two <br/> three <bR /> four <br > five"

Dim re As String = "\<[Bb][Rr](?: *\/)?\>"
    'matches <
    'followed by B or b
    'followed by R or r
    'followed by this sequence 0 or 1 times:
    '    0 or more spaces
    '    followed by /
    'followed by >
    
Dim StringWithSpaceBreaks As String = Regex.Replace(re, StringWithHtmlBreaks, " ")
'doesn't match <br > by design, although it'd be easy enough to do

Log(StringWithHtmlBreaks)
Log(StringWithSpaceBreaks)

'Log(Regex.Replace(       re       , StringWithHtmlBreaks, " " ))    'leaces spaces intact (and adds a new one to replace the <BR>
'Log(Regex.Replace(" *" & re & " *", StringWithHtmlBreaks, " " ))    'deletes spaces before and after ie reduces to single space
'
'Log(Regex.Replace(       re       , StringWithHtmlBreaks, CRLF))    'leaves spaces intact
'Log(Regex.Replace(" *" & re       , StringWithHtmlBreaks, CRLF))    'deletes spaces at end of line
'Log(Regex.Replace(" *" & re & " *", StringWithHtmlBreaks, CRLF))    'deletes spaces at end of line and start of following line
 
Upvote 0

emexes

Expert
Licensed User

I knew you'd be thrilled. ?

I'm busy using regex to scrape data out of HTML right now myself - what with the Melbourne Cup coming up soon - and I thought... while I'm here, may as well whip up a demo and post it in case it helps.

But most people's eyes glaze over at the sight of regex, which I cheerfully agree does look a bit like a cat's walked across the keyboard. ?

Makes sense if you go through it step-by-step though. Usually I structure the code to make it easier to read, eg, here's my regex for pulling race track and date from a web page:

B4X:
Dim re As String = "results\/"                 & _
                       "[^\/\""]+"             & _
                   "\/("                       & _
                       "[^\/\""]+"             & _
                   "\/"                        & _
                       "[^\/\""]+\-[Rr]ace\-"  & _
                       "1?\d"                  & _
                   ")\"""

I should probably document that a bit better so that I can still understand it in the morning. ?
 
Upvote 0

emexes

Expert
Licensed User
Is that for next Halloween day? ?

Lol here's some Halloween Horror code for you:

Swap Ints Shuffler(I) and Shuffler(J):
For I = 0 To Shuffler.Length - 1
    Dim J As Int = Rnd(0, Shuffler.Length)
    If I <> J Then
        Shuffler(I) = Shuffler(I) + Shuffler(J)
        Shuffler(J) = Shuffler(I) - Shuffler(J)
        Shuffler(I) = Shuffler(I) - Shuffler(J)
    End If
Next
 
Upvote 0
Top