Android Question RegEx Matcher with HTML Code

Discussion in 'Android Questions' started by hasexxl1988, Nov 12, 2017.

  1. hasexxl1988

    hasexxl1988 Active Member Licensed User

    i have follow problem:

    i have HTML Code from a Website:

    <span itemprop='name'>
                    Ferrari TestCar
    i need the Name of the Car.

    i have tryed with following Code:

    If Job.JobName = "PageJob" Then
    Dim mAutoName As Matcher = Regex.Matcher("<span itemprop='name'>""([^""]+)""</span>", Job.GetString)
    Do While mAutoName.Find
    End If
    Result is only: []

    Download and Job function works perfect with my ImageDownloader

    Images URLs with this Code Working:
    Dim m As Matcher = Regex.Matcher("src=\""https://mywebsite/mmo([^""]+)""", Job.GetString)
    i have found de RegEx Pattern List:

    Unfortunately, I do not know how to put together the value that the HTML code is removed
    Last edited: Nov 12, 2017
  2. sorex

    sorex Expert Licensed User

    do a replace of linefeeds, tabs and double spacings (it makes it a lot easier) and then try

    Regex.Matcher("<span itemprop='name'>(.*?)</span>", Job.GetString)
    MikeH likes this.
  3. hasexxl1988

    hasexxl1988 Active Member Licensed User

    Not Working :/

    i have try:
    If Job.JobName = "PageJob" Then
    Dim xtemp As String
                xtemp = Job.GetString
    Log ("IndexOf: " & xtemp.IndexOf("<span itemprop='name'>"))
    Dim m As Matcher = Regex.Matcher("<span itemprop='name'>(.*?)</span>", Job.GetString)
    Do While m.Find
    Log (m.Group(1))
    End If
    Log result with IndexOf: IndexOf: 112851

    With IndexOf i can find the <span itemprop='name'> in the String. With Matcher not found.
  4. inakigarm

    inakigarm Well-Known Member Licensed User

    Erel likes this.
  5. udg

    udg Expert Licensed User

    I tried the following on an on-line regex tool and it works, altough I don't think is an elegant solution; it simply works with data from post #1.
    <span itemprop='name'>\s*(.*)\s*<\/span>
    In Group 1 you read Ferrari TestCar.
    Fundamentally it matches any number of whitespaces after "'name'>", followed by the group containing the car model, followed again by any number of whitespaces chars, finally followed by </span>
  6. sorex

    sorex Expert Licensed User

    you didn't do it right. I told you to remove line breaks, tabs and extra spacing. this breaks regex lookups unless you add more lookup data.
  7. Erel

    Erel Administrator Staff Member Licensed User

    You should use jSoup or jTidy to parse html.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice