Android Question jSoup and inline Java

William Hunter

I have been looking at jsoup, the Java library for HTML , as well as TheJinJ‘s jSoup HTML Parser library for B4A. I could use the B4A library in a project, but would prefer, if possible, to use the latest version of the library with inline Java.

My needs are fairly modest. I have a raw multipart email message in a string. I would like to extract the HTML, then validate it for known safe tags. If validation failed, I would then like to extract the text portion. Has anyone used this library in this fashion, and are willing to share their experience. I have no knowledge of Java, and would appreciate any and all help given.

Best regards :)
Note that you can also use JTidy:
It converts the html to a valid XML file.
Thank you Erel. jTidy could be of some use to me if it were possible to clean up HTML contained in a string, rather than a html file. I am primarily interested in the extraction capabilities of jSoup. The code below is an excerpt from the B4A jSoup library. It performs some very nice extraction feats. I would like to accomplish the same thing, and perhaps access other features of the jSoup library, using inline java. This is likely a not too difficult chore for a capable java fan. Unfortunately, java and I do not dance well together. :(
Dim js AsjSoup
Dim html As String
Dim Extract1 As List
Dim Extract2 As List
Dim Extract3 As List
Dim Extract4 As List
Dim Test As String' For my test


' Extract Attributes, text & HTML
html = "<p>An <a href=''><b>example</b></a> link.</p>"
Extract1 = js.selectorElementText(html, "a")
Extract2 = js.selectorElementAttr(html, "a", "href")
Extract3 = js.selectorElementAttr(html, "a", "innerhtml")
Extract4 = js.selectorElementAttr(html, "a", "outerhtml")
' My test here
Test = Extract4.Get(0)
Log("Test = " & Test)
If anyone having had success using inline java and jSoup has some insight to offer, their help would be greatly appreciated.

Best regards :)
