Android Question remove all htlm tags from a string

luke2012

Well-Known Member
Licensed User
Longtime User
Hi all,
is there a quick solution to "clean" a string removing all html tags to got a plain text without any tag (see example) ?

B4X:
<p class="text-align-justify">Multibrand giovane con i migliori marchi come....</p>

For this specific string the fastest solution is...

B4X:
'str is the above string
str.Replace($"<p class="text-align-justify">"$, "").Replace($"</p>"$, "")

But this is good only for this specific string, if the tags change it is no longer guaranteed that the string is text without HTLM.

So wich is the best solution in this case ?

1) regex (is there any pattern to remove all HTML tags) ?
2) HTML parser (parsing the tags to extract the text) ?
 

DonManfred

Expert
Licensed User
Longtime User
Maybe this Lib can help?
At least i found a reference to jsoup using google. And this is a lib about jsoup.
Use a HTML parser. Here's a Jsoup example.
String input = "<font size=\"5\"><p>some text</p>\n<p>another text</p></font>";
String stripped = Jsoup.parse(input).text();
System.out.println(stripped);

Result:
some text another text


Don´t know if this lib has support for parse(input).text() though.
 
Upvote 0

tchart

Well-Known Member
Licensed User
Longtime User
Here you go, not sure if it works on B4A as I only use it on B4J

 
Upvote 0
Top