Get all links from an html file using regex

Discussion in 'Questions (Windows Mobile)' started by MM2forever, Oct 12, 2007.

  1. MM2forever

    MM2forever Active Member Licensed User

    Hi guys,
    I am trying to get links from an html file. My code (important parts of it) looks like this:
    regex.new1("href=(.*?)[\s>]")

    match.value=regex.match(htmtemp)
    Do While match.success=True
    list.add(SubString(htmtemp,match.index,match.length))
    match.value=match.nextmatch
    Loop

    Im not getting any results, whats wrong? Is it my regular expression itself?

    Thank you for your help
    Christian
    [MM2forever]
     
  2. Erel

    Erel Administrator Staff Member Licensed User

    The pattern is taken from this site: http://sastools.com/b2/post/79393902
    You should add a Regex object and a Match object.
    Code:
    Sub Globals
        
    'Declare the global variables here.

    End Sub

    Sub App_Start
        Form1.Show
        
    If OpenDialog1.Show = cCancel Then AppClose
        q = 
    Chr(34) & Chr(34)
        r = 
    "(?:[hH][rR][eE][fF]\s*=)"
        r = r & 
    "(?:[\s"&q&"(']*)"
        r = r & 
    "(?!#|[Mm]ailto|[lL]ocation.|[jJ]avascript|.*css|.*this\.)"
        r = r & 
    "(.*?)(?:[\s>)"&q&"'])"
        
    Regex.New2(r,true,true)
        FileOpen(c1,OpenDialog1.File,cRead)
        s = FileReadToEnd(c1)
        FileClose(c1)
        Match.New1
        Match.Value = 
    regex.Match(s)
        
    Do While Match.Success
            lstLinks.Add(Match.GetGroup(
    1))
            Match.Value = Match.NextMatch
        
    Loop
    End Sub
     
  3. MM2forever

    MM2forever Active Member Licensed User

    thank you, the regex works great, but i took the "bracket exception" or whatever I should call it out, because it gave my trouble with links like "gnfgn (1)"
     
Loading...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice