iOS Question Parse HTML

Discussion in 'iOS Questions' started by narek adonts, Apr 14, 2019.

  1. narek adonts

    narek adonts Well-Known Member Licensed User

    hi, i am trying to parse html to get the open graph meta elements. I search the forum for parsing html but without success. Any help ?

    Narek
     
  2. Erel

    Erel Administrator Staff Member Licensed User

  3. MarcoRome

    MarcoRome Expert Licensed User

  4. Sandman

    Sandman Well-Known Member Licensed User

    I will admit that I have done this several times, especially for quick-and-dirty solutions, and especially if it's for a limited use-case with well-known html.

    But I do feel it makes sense to link to this epic answer on the topic, from Stack Overflow:
    https://stackoverflow.com/questions...ept-xhtml-self-contained-tags/1732454#1732454
     
  5. MarcoRome

    MarcoRome Expert Licensed User

    I don't think so exactly. In several cases when there was no alternative RegEx proved to be very useful indeed. As in this example, even on different and varied HTML, if you look carefully you can always find the loophole and have the expected results

    Esample this is a small piece of the page (there are several pages) and each block often changed for something

    upload_2019-4-20_9-59-31.png
    and this is result

    upload_2019-4-20_10-0-16.png

    and this is a few code:

    Code:
    ....
    'For Pictures
    Dim rx As RegexBuilder
                    rx.Initialize.AppendEscaped(
    $"pficon-imageclick"$).Append(rx.CharAny).AppendAtLeastOne.AppendEscaped($"style="cursor:pointer">"$)
                    
    Dim mat As Matcher = Regex.Matcher(rx.Pattern, res)
                    
    Dim i As Int = 0
                    
    Do While mat.Find
                        
    'Log(mat.Match)
                        Dim foto As String = mat.match
                        foto = foto.Replace(
    $"pficon-imageclick" data-pf-link=""$,"")
                        foto = foto.Replace(
    $"" style="cursor:pointer">"$,"")
                        Starter.mappa.Put(
    "foto" & i, foto)
                        
    Log(foto)
                        i = i + 
    1
                    
    Loop
                
                    totale_record_trovati = Starter.mappa.Size
                
                    
    'For Name
                    '<li class="pflist-itemtitle"><a href="https://www.xxxxxxx.it/medici-a-domicilio/listing/dott-umberto-vorre/">Dott. Umberto Vorre</a></li>
                
                    
    Dim rx1 As RegexBuilder
                    rx1.Initialize.AppendEscaped(
    $"pflist-itemtitle"$).Append(rx1.CharAny).AppendAtLeastOne.AppendEscaped($"</a></li>"$)
                    
    Dim mat1 As Matcher = Regex.Matcher(rx1.Pattern, res)
                    
    Dim i As Int = 0
                    
    Do While mat1.Find
                        
    'Log(mat1.Match)
                        Dim nome As String = mat1.match
                        nome = nome.Replace(
    $"pflist-itemtitle">"$,"")
                        
    Dim preleva_link_da_cancellare As String
                        
    Dim lunghezza As Int
                        lunghezza =  nome.LastIndexOf(
    $"">"$)
                        preleva_link_da_cancellare = nome.SubString2(
    0, lunghezza)
                        
    'Log(preleva_link_da_cancellare)
                        nome = nome.Replace(preleva_link_da_cancellare, "")
                        nome = nome.Replace(
    $"">"$,"")
                        nome = nome.Replace(
    $"</a></li>"$,"")
                        nome = HtmlDecoder(nome)
                        Starter.mappa.Put(
    "nome" & i, nome)
                        
    Log(nome)
                        i = i + 
    1
                    
    Loop
                    ...
    From the parsing we have taken the photo, mame, address, rating, etc. it is only a matter of observing and losing time. But even in difficult cases the solution is always there
     
  6. Brandsum

    Brandsum Well-Known Member Licensed User

    I use this code to show some basic HTML content,
    Code:
    #IF OBJC
    - (NSAttributedString*) SetHTML:(NSString*) htmlString {
        NSAttributedString *attributedString = [[NSAttributedString alloc]
                  initWithData: [htmlString dataUsingEncoding:NSUnicodeStringEncoding]
                       options: @{ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType }
            documentAttributes: nil
                         error: nil
        ];
        return attributedString;
    }
    #End If

    'usage
    Dim html As String = "<p><del>Deleted</del><p>List<ul><li>Coffee</li><li>Tea</li></ul><br><a href='URL'>Link </a>"
        
    Dim NObj As NativeObject = Me
    TextView_HTML.AttributedText = NObj.RunMethod(
    "SetHTML:"Array(html))
     
    nwhitfield likes this.
Loading...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice