Need Help with RegEx and replacing stings in a html file

Omegarex

Member
Licensed User
Longtime User
I need to strip some tags from an html file. I have the lines of code below. One line works and the other doesn't. Any help would be greatly appreciated.

html = html.Replace(">", ">" & CRLF) <-- This line works

html = html.Replace("/<head>.*?<\/head>/is", "") <-- This line doesn't
 

melamoud

Active Member
Licensed User
Longtime User
the second line wont work since string.replace is not regecp replace, its a string replace only call.

in order to use regexp you need something like this sub:
B4X:
Sub RegexReplace(Pattern As String, Text As String, Replacement As String) As String    
   Dim m As Matcher    
   m = Regex.Matcher(Pattern, Text)    
   Dim r As Reflector    
   r.Target = m    
   Return r.RunMethod2("replaceAll", Replacement, "java.lang.String")
End Sub

' example of how to use it
sub parser
Dim s As String = Utilities.RegexReplace("<head>.*?<\/head>", "jlasdkj <head> yes </head>more!","")
   Log ("---" & s)
end sub
 

Omegarex

Member
Licensed User
Longtime User
RegexReplace code doesnt work

Melamoud,

Thank you for your response.

The problem I am having now is that the <head> and </head> tags are not on the same line. So the regex code does not find a match and replace the tags. Do you have any other suggestions to fix the code?
 

Omegarex

Member
Licensed User
Longtime User
stupid question

melamoud,

I know this is a stupid question but could you post the code to remove all the line feeds...
 

melamoud

Active Member
Licensed User
Longtime User
B4X:
        Dim s As String 
   s= "jlasdkj <head> yes " & CRLF & "</head>more!"
   s= s.Replace(CRLF,"")
   s = RegexReplace("<head>.*?<\/head>",s,"")
    Log ("---" & s)
 
Top