Android Question Need some help parsing html string

qsrtech

Active Member
Licensed User
Longtime User
I need some help to extract/remove some tags within HTML. The specific one I need help with (and then I can probably take it from there for any others) is removing the "href" tag and only keeping the content for example
<a href="somelink">content</a>

so I only want 'content' left within the string and this has to work for multiple occurrences within the "HTML" string.

Thanks :)

EDIT: I kinda solved it with this but if you have a better way please feel free to share it

B4X:
        iPos=newHTML.IndexOf("<a href=" & QUOTE)
        Do While iPos<>-1
            'find the pos of '">'
            endPos=newHTML.IndexOf2(QUOTE & ">",iPos)
            sSub=newHTML.SubString2(iPos,endPos+2)
            newHTML=newHTML.Replace(sSub,"")
            iPos=newHTML.IndexOf("<a href=" & QUOTE)
        Loop
 
Last edited:

sorex

Expert
Licensed User
Longtime User
you could use

B4X:
links=Regex.Matcher("<a href=""(.*?)"">(.*?)</a>",myHTML)

to grab and loop through all hyperlinks and pick only the title.
 
Upvote 0

qsrtech

Active Member
Licensed User
Longtime User
Ok thanks :)
I will check it out later. The "regular expressions" are powerful but i don't use it enough to really spend a lot of time to learn the ins and outs. Maybe one day...
 
Upvote 0
Top