Get all links from an html file using regex

MM2forever

Active Member
Licensed User
Hi guys,
I am trying to get links from an html file. My code (important parts of it) looks like this:
regex.new1("href=(.*?)[\s>]")

match.value=regex.match(htmtemp)
Do While match.success=True
list.add(SubString(htmtemp,match.index,match.length))
match.value=match.nextmatch
Loop

Im not getting any results, whats wrong? Is it my regular expression itself?

Thank you for your help
Christian
[MM2forever]
 

Erel

Administrator
Staff member
Licensed User
The pattern is taken from this site: http://sastools.com/b2/post/79393902
You should add a Regex object and a Match object.
B4X:
Sub Globals
    'Declare the global variables here.

End Sub

Sub App_Start
    Form1.Show
    If OpenDialog1.Show = cCancel Then AppClose
    q = Chr(34) & Chr(34)
    r = "(?:[hH][rR][eE][fF]\s*=)"
    r = r & "(?:[\s"&q&"(']*)"
    r = r & "(?!#|[Mm]ailto|[lL]ocation.|[jJ]avascript|.*css|.*this\.)"
    r = r & "(.*?)(?:[\s>)"&q&"'])"
    Regex.New2(r,true,true)
    FileOpen(c1,OpenDialog1.File,cRead)
    s = FileReadToEnd(c1)
    FileClose(c1)
    Match.New1
    Match.Value = regex.Match(s)
    Do While Match.Success
        lstLinks.Add(Match.GetGroup(1))
        Match.Value = Match.NextMatch
    Loop
End Sub
 
Top