B4J Question MiniHTMLParser Error

MathiasM

Active Member
Licensed User
Hello

I try to get the text in a <a> tag on a webpage.
However, I get an error:
Waiting for debugger to connect...
Program started.
Error occurred on line: 276 (MiniHtmlParser)
java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
...
Program terminated (StartMessageLoop was not called).

The tags look like this:
HTML:
<div class="breadcrumbs">
         <a href="../../home.html">Home</a> &gt; <a href="../../commands.html">Commands</a> &gt; <a href="../Image.htm">Image</a> &gt; CreateRenderImage
</div>

So I try to get "Home", "Commands" and "Image".

This is my code:
B4X:
Private HtmlParser As MiniHtmlParser
    HtmlParser.Initialize
    Dim root As HtmlNode = HtmlParser.Parse(File.ReadString(File.DirAssets, "TestHTML.txt"))
    Dim breadcrumbs As HtmlNode = HtmlParser.FindNode(root, "div", HtmlParser.CreateHtmlAttribute("class", "breadcrumbs"))
    For Each n As HtmlNode In breadcrumbs.Children
        Log(HtmlParser.GetTextFromNode(n, 0))
    Next

A minimum project is added to this post.

Thanks a lot.
 

Attachments

  • MiniHTMLError.zip
    2.8 KB · Views: 96

OliverA

Expert
Licensed User
Longtime User
Log(HtmlParser.GetTextFromNode(n, 0))
Your assuming that a node has children, and in some cases it may not be so
B4X:
if n.Children.Size > 0 then Log(HtmlParser.GetTextFromNode(n,0))
 
Upvote 0

MathiasM

Active Member
Licensed User
Your assuming that a node has children, and in some cases it may not be so
Thanks for your answer OliverA. I understand what your code does, but I can't see why it is needed.

In the HTML
HTML:
<div class="breadcrumbs">
         <a href="../../home.html">Home</a> &gt; <a href="../../commands.html">Commands</a> &gt; <a href="../Image.htm">Image</a> &gt; CreateRenderImage
      </div>

In this code, I see the structure as this:
The <div> breadcrumbs has 3 childeren, the 3 <a> tags, they all have a Text value, why is it needed to check if the <a> has children to get their text value?
And if the text value is seen as a child, why would there be an out of bound exception, as they all have a text value?

I seem to miss something fundamental about these HTML tags.

Thanks for any input!
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
<div> breadcrumbs has 3 childeren, the 3 <a>
But that is not what this library sees. Log the size and the n's (use one of the HtmlParser methods to see the content of the n's) to see what it sees.
Note: log the size of breadcrumbs children list
 
Upvote 0
Top