Android Code Snippet [B4X] Code to extract the paths of the src attribute in all <img> tags of a HTML document

    Code to extract the paths of the src attribute in all <img> tags of a HTML document.

    The routine became necessary because a list of all used images was needed.

    Beware. This small solution is based on Regex. Parsing HTML documents with Regex is basically a bad idea and is better solved via XML transformation and analysis.

    However, if you need a small, manageable routine for relatively well-known HTML documents, you can take a first approach here.

    Sub GetHtmlImagesList(HtmlString As StringAs List
    Dim ReturnList As List
    If HtmlString.IndexOf("<img ") < 1 Then
    Return ReturnList
    Dim MatchWholeImgTag As Matcher
    Dim MatchFilename As Matcher
    Dim FoundFilenameString As String  = ""
    Dim ImageTagString As String  = ""
            MatchWholeImgTag = 
    Regex.Matcher("<img[^>]* src=[^>]*>", HtmlString)  ' Find WHOLE IMAGE TAG:    <img src="...">
            Do While MatchWholeImgTag.find()
                ImageTagString = MatchWholeImgTag.Match  
    ' <img src="img1.png" width="96" height="96" >
                Dim RXOptions As Int = Regex.MULTILINE
                MatchFilename = 
    Regex.Matcher2($"<img.*?src="([^"]+)".*?>"$, RXOptions, ImageTagString)    ' Find the FILENAME in src -->
                If MatchFilename.Find Then
                    FoundFilenameString = MatchFilename.Group(
    End If
    Return ReturnList
    End If
    End Sub
    Note: The performance can certainly be greatly improved by code optimizations. Here the code is a bit inflated for the sake of traceability.
    A small Testproject is attached.

