B4A Library jSoup HTML Parser

Martin Larsen · Jun 15, 2015

First, thanks TheJinJ for making this library available! It works fine so far, although the syntax is very much different from the original version.

Jaames said:
It would be great if it can be done to use this lib in this way:

I second that! I have used jsoup for several java based Android projects and I really like the chainable jQuery like syntax.

I would very much like to have something like that implemented for this wrapper library.

Erel, if you read this: Is it possible to use method chaining in B4A?

DonManfred · Jun 15, 2015

using a correctly written java library you can use chaining. But not with plain b4a as far as i know

Martin Larsen · Jun 16, 2015

Do you mean that it is possible to make a library wrapper in B4A that uses chaining if the Java library is written correctly?

BowTieNeck · Aug 8, 2015

The latest version of JSoup is 1.8.3. I'm getting an error because the current code is expecting version 1.8.1. I couldn't see anywhere that I could download the older version of JSoup. Would it be possible for you to get the code to just pick up whatever version is in the libraries folder?
Thanks,
Chris

Edit:
I've changed your xml file so it now depends on jsoup-1.8.3 and that works ok. However it's not really a long term solution.

mr23 · Aug 29, 2015

Update: a reboot of the PC and now it works, go figure.

I pulled down 1.8.1 from the first post, and the b4a example, placed the jSoup.jar,.xml and jsoup-1.8.1.jar into an additional library folder.
Using b4a v4.3, just trying to compile the project fails on line 56 with missing parameter(s).

56 Log(js.connectXtra(url, "Mozilla", 0))

'intellisense' shows a number of additional required parameters.

Commenting that line out, and it gets to

66 DOM1 = js.getElementsByTag(local_html, "a", "")

with 'intellisense' showing only 2 parameters in getElementsByTag.
Have I made a mistake, or is the B4A sample out of date with the supplied library files?

I was looking to try this as JTidy doesn't have any tolerance for unrecognized tags or malformation or (haven't dug in yet) html. JTidy doesn't work with 'http://google.com' nor with 'https://www.b4x.com/android/forum/forums/share-your-creations.33/page-1?order=view_count' for examples.

update: found this enhancement that may help but need to wrap it to test. https://github.com/nanndoj/jtidy

-Chris

TheJinJ · Aug 29, 2015

Attached sample works with jsoup 1.8.1, haven't looked at this for a while. Not sure where your error comes from, I'll test it out the hen in back at a PC

Martin Larsen · Aug 29, 2017

How do you work with a js doc read from a file like in your example:

B4X:

js.parse_InputStream(File.OpenInput(File.DirAssets, "test.html"), "UTF-8", url)

How do you eg. select an element:

B4X:

js.getElementByID(local_html, "name"))

These methods work on a local html string as in the snippet about. What if you needed to select the element from the file just read?

PS. I know you can of course read the local html with File.ReadString but since the parse_inputStream method (and likewise the connect() method) exists, there surely must be a way to work with them.

Rusty · Apr 4, 2018

I could not get your sample code to compile.
It looks like there are many parameters missing using the latest jsoup.jar.
Is there any updated sample anywhere?
Thanks
Rusty

Erel · Apr 5, 2018

Note that you can use jTidy as an alternative.

B4A Library jSoup HTML Parser

Attachments

Martin Larsen

Active Member

DonManfred

Expert

Martin Larsen

Active Member

BowTieNeck

Member

mr23

Active Member

TheJinJ

Active Member

Martin Larsen

Active Member

Rusty

Well-Known Member

Erel

B4X founder

Similar Threads