iOS Question Parse HTML

Sandman

Expert
Licensed User
Longtime User
Upvote 0

MarcoRome

Expert
Licensed User
Longtime User

I don't think so exactly. In several cases when there was no alternative RegEx proved to be very useful indeed. As in this example, even on different and varied HTML, if you look carefully you can always find the loophole and have the expected results

Esample this is a small piece of the page (there are several pages) and each block often changed for something


and this is result



and this is a few code:

B4X:
....
'For Pictures
Dim rx As RegexBuilder
                rx.Initialize.AppendEscaped($"pficon-imageclick"$).Append(rx.CharAny).AppendAtLeastOne.AppendEscaped($"style="cursor:pointer">"$)
                Dim mat As Matcher = Regex.Matcher(rx.Pattern, res)
                Dim i As Int = 0
                Do While mat.Find
                    'Log(mat.Match)
                    Dim foto As String = mat.match
                    foto = foto.Replace($"pficon-imageclick" data-pf-link=""$,"")
                    foto = foto.Replace($"" style="cursor:pointer">"$,"")
                    Starter.mappa.Put("foto" & i, foto)
                    Log(foto)
                    i = i + 1
                Loop
            
                totale_record_trovati = Starter.mappa.Size
            
                'For Name
                '<li class="pflist-itemtitle"><a href="https://www.xxxxxxx.it/medici-a-domicilio/listing/dott-umberto-vorre/">Dott. Umberto Vorre</a></li>
            
                Dim rx1 As RegexBuilder
                rx1.Initialize.AppendEscaped($"pflist-itemtitle"$).Append(rx1.CharAny).AppendAtLeastOne.AppendEscaped($"</a></li>"$)
                Dim mat1 As Matcher = Regex.Matcher(rx1.Pattern, res)
                Dim i As Int = 0
                Do While mat1.Find
                    'Log(mat1.Match)
                    Dim nome As String = mat1.match
                    nome = nome.Replace($"pflist-itemtitle">"$,"")
                    Dim preleva_link_da_cancellare As String
                    Dim lunghezza As Int
                    lunghezza =  nome.LastIndexOf($"">"$)
                    preleva_link_da_cancellare = nome.SubString2(0, lunghezza)
                    'Log(preleva_link_da_cancellare)
                    nome = nome.Replace(preleva_link_da_cancellare, "")
                    nome = nome.Replace($"">"$,"")
                    nome = nome.Replace($"</a></li>"$,"")
                    nome = HtmlDecoder(nome)
                    Starter.mappa.Put("nome" & i, nome)
                    Log(nome)
                    i = i + 1
                Loop
                ...

From the parsing we have taken the photo, mame, address, rating, etc. it is only a matter of observing and losing time. But even in difficult cases the solution is always there
 
Upvote 0

Brandsum

Well-Known Member
Licensed User
I use this code to show some basic HTML content,
B4X:
#IF OBJC
- (NSAttributedString*) SetHTML:(NSString*) htmlString {
    NSAttributedString *attributedString = [[NSAttributedString alloc]
              initWithData: [htmlString dataUsingEncoding:NSUnicodeStringEncoding]
                   options: @{ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType }
        documentAttributes: nil
                     error: nil
    ];
    return attributedString;
}
#End If

'usage
Dim html As String = "<p><del>Deleted</del><p>List<ul><li>Coffee</li><li>Tea</li></ul><br><a href='URL'>Link </a>"
    
Dim NObj As NativeObject = Me
TextView_HTML.AttributedText = NObj.RunMethod("SetHTML:", Array(html))
 
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…