Help with Regex

walterf25

Expert
Licensed User
Longtime User
Hello all, i need to parse a few lines from a website, basically what i need is to extract an image, i'm using the HTTPUtils library and so far everything is working great, but i can't figure out how to do this part.

This is the line i need to parse
<div class="torpicture">
<img src="http://image.bayimg.com/3cd148d9955431b62ab956f1631de3d87fb1636c.jpg" title="picture" alt="picture" />
</div>

what i need to extract is the img src= line i've tried this:
B4X:
class = "<div class=" & QUOTE & "torpicture" & QUOTE & ">"
   pattern = "img source=\q([^q]+)\q".Replace("q", QUOTE)
Dim line As String
   Dim line2 As String 
   line = TextReader1.ReadLine
   line2 = TextReader2.ReadLine
   Do While line <> Null
      If line.IndexOf(class) > -1 Then
         Dim link As String
         Dim m As Matcher
         m = Regex.Matcher(pattern, line)
         If m.Find Then
            'Dim ImageJob As HttpJob
            'ImageJob.Initialize("ImageJob", Me)
            Log("group: " & m.Group(0))
            Log("group: " & m.Group(1))   
         End If

but i can't get it to extract the HTTP address for the image.
any help on how to do this will be appreciated.

Thanks all in advanced!
Cheers!:BangHead:
 

Ohanian

Active Member
Licensed User
Longtime User
Hi,

try this code :

B4X:
Dim sHTML As String   : sHTML = ""   
   Dim Matcher_ As Matcher
   
   sHTML = sHTML & "<div class='torpicture'>"
   sHTML = sHTML & "<img src='http://image.bayimg.com/3cd148d9955431b62ab956f1631de3d87fb1636c.jpg' title='picture' alt='picture' />"
   sHTML = sHTML & "</div>"
   sHTML = sHTML & "<div class='torpicture'>"
   sHTML = sHTML & "<img src='http://image.bayimg.com/3cd148d9955431b62ab956f1631de3d87fb1636c.jpg' title='picture' alt='picture' />"
   sHTML = sHTML & "</div>"
   sHTML = sHTML & "<div class='torpicture'>"
   sHTML = sHTML & "<img src='http://image.bayimg.com/3cd148d9955431b62ab956f1631de3d87fb1636c.jpg' title='picture' alt='picture' />"
   sHTML = sHTML & "</div>"
   
   Matcher_ = Regex.Matcher("<img[^>]*>", sHTML)
   Dim sImg As String
   Dim strFileName As String                                                            
   
   Try
      Dim iPanel As Int : iPanel = 0
      
      Do While Matcher_.Find
         
         sImg = Matcher_.Match
         
         If sImg.IndexOf("https://") <> -1 Then
            sImg = sImg.SubString2(sImg.IndexOf("https://"), sImg.IndexOf("title="))
         Else
            sImg = sImg.SubString2(sImg.IndexOf("http://"), sImg.IndexOf("title="))
         End If
         
         sImg = sImg.Replace(QUOTE, "")
         sImg = sImg.Replace("'", "")
         
         strFileName = sImg.SubString(sImg.LastIndexOf("/"))
         
         strFileName = strFileName.Replace("/", "")
         
         ToastMessageShow(sImg & CRLF & strFileName, True)
      Loop   
   Catch
   
   End Try
 
Upvote 0
Top