Need Help with RegEx and replacing stings in a html file

Omegarex · Apr 11, 2013

I need to strip some tags from an html file. I have the lines of code below. One line works and the other doesn't. Any help would be greatly appreciated.

html = html.Replace(">", ">" & CRLF) <-- This line works

html = html.Replace("/<head>.*?<\/head>/is", "") <-- This line doesn't

melamoud · Apr 12, 2013

the second line wont work since string.replace is not regecp replace, its a string replace only call.

in order to use regexp you need something like this sub:

B4X:

Sub RegexReplace(Pattern As String, Text As String, Replacement As String) As String    
   Dim m As Matcher    
   m = Regex.Matcher(Pattern, Text)    
   Dim r As Reflector    
   r.Target = m    
   Return r.RunMethod2("replaceAll", Replacement, "java.lang.String")
End Sub

' example of how to use it
sub parser
Dim s As String = Utilities.RegexReplace("<head>.*?<\/head>", "jlasdkj <head> yes </head>more!","")
   Log ("---" & s)
end sub

Omegarex · Apr 12, 2013

RegexReplace code doesnt work

Melamoud,

Thank you for your response.

The problem I am having now is that the <head> and </head> tags are not on the same line. So the regex code does not find a match and replace the tags. Do you have any other suggestions to fix the code?

melamoud · Apr 12, 2013

just start by replacing all new lines with "", right ?

Omegarex · Apr 12, 2013

stupid question

melamoud,

I know this is a stupid question but could you post the code to remove all the line feeds...

melamoud · Apr 12, 2013

B4X:

        Dim s As String 
   s= "jlasdkj <head> yes " & CRLF & "</head>more!"
   s= s.Replace(CRLF,"")
   s = RegexReplace("<head>.*?<\/head>",s,"")
    Log ("---" & s)

Omegarex · Apr 13, 2013

Thak you

Thank you very much for the code.

Need Help with RegEx and replacing stings in a html file

Omegarex

Member

melamoud

Active Member

Omegarex

Member

melamoud

Active Member

Omegarex

Member

melamoud

Active Member

Omegarex

Member