Android Question Need some help parsing html string

qsrtech · May 30, 2015

I need some help to extract/remove some tags within HTML. The specific one I need help with (and then I can probably take it from there for any others) is removing the "href" tag and only keeping the content for example
<a href="somelink">content</a>

so I only want 'content' left within the string and this has to work for multiple occurrences within the "HTML" string.

Thanks

EDIT: I kinda solved it with this but if you have a better way please feel free to share it

B4X:

        iPos=newHTML.IndexOf("<a href=" & QUOTE)
        Do While iPos<>-1
            'find the pos of '">'
            endPos=newHTML.IndexOf2(QUOTE & ">",iPos)
            sSub=newHTML.SubString2(iPos,endPos+2)
            newHTML=newHTML.Replace(sSub,"")
            iPos=newHTML.IndexOf("<a href=" & QUOTE)
        Loop

NJDude · May 30, 2015

You can try THIS or THIS

qsrtech · Jun 2, 2015

NJDude said:
You can try THIS or THIS

Thanks. I already looked at the first post but couldn't figure out how to translate it to my situation. The second post works great for removing all tags.

sorex · Jun 2, 2015

you could use

B4X:

links=Regex.Matcher("<a href=""(.*?)"">(.*?)</a>",myHTML)

to grab and loop through all hyperlinks and pick only the title.

qsrtech · Jun 2, 2015

Ok thanks

I will check it out later. The "regular expressions" are powerful but i don't use it enough to really spend a lot of time to learn the ins and outs. Maybe one day...

inakigarm · Jun 2, 2015

Check Jsoup html parser; it' s very easy !! https://www.b4x.com/android/forum/threads/jsoup-html-parser.49152/

Android Question Need some help parsing html string

qsrtech

Active Member

NJDude

Expert

qsrtech

Active Member

sorex

Expert

qsrtech

Active Member

inakigarm

Well-Known Member

Similar Threads