Parsing "Complex?" HTML Websites via regex?

Ize

Member
Licensed User
Longtime User
Hello,

i'm trying to parse all Tracks from this webpage: The Worlds Leading Radio Station in the Harder Styles of Music - Fear.FM into a ListView and used the FlickrViewer as a base.

I thought it would be an easy task...switching some words and the regex but i was wrong :signOops:

It downloads the website and i can see that it finds the correct lines in the loop, or atleast about the right amount of lines.

However the regex doesn't seem to match anything and therefore the ListView isn't populated :(

I guess my regex and / or class is set up wrong and probably more... cause im new to regex and all the parsing stuff :sign0104:

Can anyone tell me what i did wrong or forgot to change?

I basically just want to parse the Trackname to the ListView for now.

The Page also contains the Time the Track was played and on some Tracks even a corresponding image.

It would be ideal if i could parse all at once and then populate the ListView with: Time, Trackname & Image

I've been playing around with this for hours and all combinations of regex stuff as far as i could figure out but im stuck :(

I have attached the modified (hacked together :p) Example.
 

walterf25

Expert
Licensed User
Longtime User
Parsing Problem

Hi there, i took a look at your code and in the Sub HandleMainpage subroutine,
in the line
B4X:
If m.Find Then
there's never found any match which is why the listview never gets populated.

B4X:
Sub HandleMainPage
   If HttpUtils.IsSuccess(MainUrl) = False Then
      ToastMessageShow("Error downloading main page.", True)
      Return
   End If
   ResetImagesBackground
   start = DateTime.Now
   Dim TextReader1 As TextReader
   TextReader1.Initialize(HttpUtils.GetInputStream(MainUrl))
   Dim pattern, class As String
   '<div class="column_50 top40-info">
   class = "<div class=" & QUOTE & "row underline hover top40" & QUOTE & ">"
   pattern = "<strong>.*</strong>"
   Dim links As List
   links.Initialize
   Dim line As String
   line = TextReader1.ReadLine
   Log(line)
   Do While line <> Null
      If line.IndexOf(class) > -1 Then        
         Dim link As String
         Dim m As Matcher
         m = Regex.Matcher(pattern, line)
         If m.Find Then   'this is the problem it never finds a match
            ListView1.AddSingleLine(m.Group(1))
            'ListView1.AddSingleLine("does this do anything?!")
         End If
      End If
      line = TextReader1.ReadLine
   Loop
   TextReader1.Close
   Log("done parsing main page: " & (DateTime.Now - start))
   HttpUtils.CallbackUrlDoneSub = "ImageUrlDone"
   HttpUtils.DownloadList("Images", links)
   btnConnect.Enabled = False
   ProgressDialogHide
End Sub

cheers,
Walter
 
Last edited:
Upvote 0

admac231

Active Member
Licensed User
Longtime User
On this line (line 62):
B4X:
      If line.IndexOf(class) > -1 Then
You aren't doing anything unless the line contains:
B4X:
"<div class=" & QUOTE & "row underline hover top40" & QUOTE & ">"
However the expression you are trying to match is on a line such as:
B4X:
<strong>Proppy &amp; Heady - Summer Of Bonkerz</strong><br />

So, you are looking for "<strong>.*</strong>" on <span class="top40-info-track">. Basically, you are searching for something that isn't there.

So remove the if statement from line 62 et voila
B4X:
   pattern = "<strong>.*</strong>"
   Dim links As List
   links.Initialize
   Dim line As String
   line = TextReader1.ReadLine
   Do While line <> Null
         Dim link As String
         Dim m As Matcher
         m = Regex.Matcher(pattern, line)
         If m.Find Then
               i =i +1
            ListView1.AddSingleLine(i&". "&m.Match.Replace("<strong>","").Replace("</strong>",""))
         End If
      line = TextReader1.ReadLine
   Loop
 
Upvote 0
Top