iOS Question Parse HTML

MarcoRome

Expert
Licensed User
I will admit that I have done this several times, especially for quick-and-dirty solutions, and especially if it's for a limited use-case with well-known html.

But I do feel it makes sense to link to this epic answer on the topic, from Stack Overflow:
https://stackoverflow.com/questions...ept-xhtml-self-contained-tags/1732454#1732454
I don't think so exactly. In several cases when there was no alternative RegEx proved to be very useful indeed. As in this example, even on different and varied HTML, if you look carefully you can always find the loophole and have the expected results

Esample this is a small piece of the page (there are several pages) and each block often changed for something

upload_2019-4-20_9-59-31.png

and this is result

upload_2019-4-20_10-0-16.png


and this is a few code:

B4X:
....
'For Pictures
Dim rx As RegexBuilder
                rx.Initialize.AppendEscaped($"pficon-imageclick"$).Append(rx.CharAny).AppendAtLeastOne.AppendEscaped($"style="cursor:pointer">"$)
                Dim mat As Matcher = Regex.Matcher(rx.Pattern, res)
                Dim i As Int = 0
                Do While mat.Find
                    'Log(mat.Match)
                    Dim foto As String = mat.match
                    foto = foto.Replace($"pficon-imageclick" data-pf-link=""$,"")
                    foto = foto.Replace($"" style="cursor:pointer">"$,"")
                    Starter.mappa.Put("foto" & i, foto)
                    Log(foto)
                    i = i + 1
                Loop
            
                totale_record_trovati = Starter.mappa.Size
            
                'For Name
                '<li class="pflist-itemtitle"><a href="https://www.xxxxxxx.it/medici-a-domicilio/listing/dott-umberto-vorre/">Dott. Umberto Vorre</a></li>
            
                Dim rx1 As RegexBuilder
                rx1.Initialize.AppendEscaped($"pflist-itemtitle"$).Append(rx1.CharAny).AppendAtLeastOne.AppendEscaped($"</a></li>"$)
                Dim mat1 As Matcher = Regex.Matcher(rx1.Pattern, res)
                Dim i As Int = 0
                Do While mat1.Find
                    'Log(mat1.Match)
                    Dim nome As String = mat1.match
                    nome = nome.Replace($"pflist-itemtitle">"$,"")
                    Dim preleva_link_da_cancellare As String
                    Dim lunghezza As Int
                    lunghezza =  nome.LastIndexOf($"">"$)
                    preleva_link_da_cancellare = nome.SubString2(0, lunghezza)
                    'Log(preleva_link_da_cancellare)
                    nome = nome.Replace(preleva_link_da_cancellare, "")
                    nome = nome.Replace($"">"$,"")
                    nome = nome.Replace($"</a></li>"$,"")
                    nome = HtmlDecoder(nome)
                    Starter.mappa.Put("nome" & i, nome)
                    Log(nome)
                    i = i + 1
                Loop
                ...
From the parsing we have taken the photo, mame, address, rating, etc. it is only a matter of observing and losing time. But even in difficult cases the solution is always there
 

Brandsum

Well-Known Member
Licensed User
I use this code to show some basic HTML content,
B4X:
#IF OBJC
- (NSAttributedString*) SetHTML:(NSString*) htmlString {
    NSAttributedString *attributedString = [[NSAttributedString alloc]
              initWithData: [htmlString dataUsingEncoding:NSUnicodeStringEncoding]
                   options: @{ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType }
        documentAttributes: nil
                     error: nil
    ];
    return attributedString;
}
#End If

'usage
Dim html As String = "<p><del>Deleted</del><p>List<ul><li>Coffee</li><li>Tea</li></ul><br><a href='URL'>Link </a>"
    
Dim NObj As NativeObject = Me
TextView_HTML.AttributedText = NObj.RunMethod("SetHTML:", Array(html))
 
Top