Android Question jSoup and inline Java

William Hunter

Active Member
Licensed User
Longtime User
I have been looking at jsoup, the Java library for HTML , as well as TheJinJ‘s jSoup HTML Parser library for B4A. I could use the B4A library in a project, but would prefer, if possible, to use the latest version of the jsoup.org library with inline Java.

My needs are fairly modest. I have a raw multipart email message in a string. I would like to extract the HTML, then validate it for known safe tags. If validation failed, I would then like to extract the text portion. Has anyone used this library in this fashion, and are willing to share their experience. I have no knowledge of Java, and would appreciate any and all help given.

Best regards :)
 
Last edited:

William Hunter

Active Member
Licensed User
Longtime User
Note that you can also use JTidy: https://www.b4x.com/android/forum/threads/jtidy-library-convert-html-pages-to-xml.27038/#content
It converts the html to a valid XML file.
Thank you Erel. jTidy could be of some use to me if it were possible to clean up HTML contained in a string, rather than a html file. I am primarily interested in the extraction capabilities of jSoup. The code below is an excerpt from the B4A jSoup library. It performs some very nice extraction feats. I would like to accomplish the same thing, and perhaps access other features of the jSoup.org jSoup library, using inline java. This is likely a not too difficult chore for a capable java fan. Unfortunately, java and I do not dance well together. :(
B4X:
Dim js AsjSoup
Dim html As String
Dim Extract1 As List
Dim Extract2 As List
Dim Extract3 As List
Dim Extract4 As List
Dim Test As String' For my test

Extract1.Initialize
Extract2.Initialize
Extract3.Initialize
Extract4.Initialize

' Extract Attributes, text & HTML
html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>"
Extract1 = js.selectorElementText(html, "a")
Log(Extract1.Get(0))
Extract2 = js.selectorElementAttr(html, "a", "href")
Log(Extract2.Get(0))
Extract3 = js.selectorElementAttr(html, "a", "innerhtml")
Log(Extract3.Get(0))
Extract4 = js.selectorElementAttr(html, "a", "outerhtml")
Log(Extract4.Get(0))
' My test here
Test = Extract4.Get(0)
Log("Test = " & Test)
If anyone having had success using inline java and jSoup has some insight to offer, their help would be greatly appreciated.

Best regards :)
 
Upvote 0
Top