B4A Library [B4X] MiniHtmlParser - simple html parser implemented with B4X

Erel · Jun 3, 2020

- 0.91 - Fixes an issue with text after the last element.

Erel · Jun 29, 2020

- 0.92 - Unescapes more entities including entities written with the unicode code point, e.g. ℵ

Erel · Aug 11, 2020

- 0.93 - Fixes an issue with whitespace characters being removed too aggressively.

Erel · Oct 20, 2020

- 0.94 - New FindDirectNodes method. Returns a list with the direct child nodes that match the tag name and optionally the attribute.
New IsNodeMatches methods - tests whether the given node matches the tag name and optionally the attribute.
Example was updated. It was broken by the change in v0.93. It is now built using FindDirectNodes and is more robust than the previous implementation.

MathiasM · Mar 19, 2021

Hello Erel

I like this library very much. But a small question: What was the design intention of HTMLParser.GetTextFromNode() instead of HTMLNode.Text ?
Or FindNode() or other similar calls.
The second one looks more 'B4X-like' than the method used now.

Thanks a lot for this library!

Erel · Mar 21, 2021

Technically it is designed like this because HtmlNode is a user type and not a class by itself so it cannot have methods. It is a bit faster like this compared to a full class, though it could have been designed differently.

aeric · Feb 5, 2025

Suggestion update:

B4X:

Private Sub ParseAttributes (Parent As HtmlNode)
    Dim start As Int = Index
    ReadUntil(">")
    Dim s As String = mHtml.SubString2(start, Index - 1)
    For Each EscapeChar As String In Array("'", $"""$)
        'allow attribute names contain dashes (-), e.g data-value or aria-label
        'Dim m As Matcher = Regex.Matcher($"(\w+)\s*=\s*\${EscapeChar}([^${EscapeChar}]+)\${EscapeChar}"$, s)
        Dim m As Matcher = Regex.Matcher($"([a-zA-Z0-9-]+)\s*=\s*\${EscapeChar}([^${EscapeChar}]+)\${EscapeChar}"$, s)
        Do While m.Find
            Parent.Attributes.Add(CreateHtmlAttribute(m.Group(1), m.Group(2)))
        Loop
    Next
End Sub

Erel · Feb 6, 2025

- 0.95 - Fixes an issue with attributes keys containing dashes. Thank you @aeric for the fix!

B4A Library [B4X] MiniHtmlParser - simple html parser implemented with B4X

Attachments

Erel

B4X founder

Erel

B4X founder

Erel

B4X founder

Erel

B4X founder

MathiasM

Active Member

Erel

B4X founder

aeric

Expert

Erel

B4X founder

Similar Threads