Android Question Trying to parse youtube links

dimp

New Member
Licensed User
Longtime User
Hi,

I'm fetching data from a remote server that returns markdown. I'm using showdownjs in the webview to convert the markdown to HTML. In the b4a code I'm trying to parse youtube links that are not inside an <a>, <iframe> or any other tag. Example markdown return code:

B4X:
## Welcome!

This is a B4A Question. Here is a youtube video with info

https://www.youtube.com/watch?v=5871y4BAGqE

Using Substring, IndexOf, IndexOf2 and Contains string functions, I can get the youtube link. However the server might return mixed markdown with HTML code, or just HTML code (depends on what the user typed):

B4X:
## Welcome!

This is a B4A Question. Here is a youtube video with info

<iframe src="https://www.youtube.com/watch?v=5871y4BAGqE" width="50%" frameborder="0" allowfullscreen=""></iframe>

or he might just link the video, as he doesn't want to embed it.

B4X:
## Welcome!

This is a B4A Question. <a href="https://www.youtube.com/watch?v=5871y4BAGqE">Here is a youtube video with info</a>

or, final case here, plain markdown:

B4X:
## Welcome!

This is a B4A Question. ![Here is a youtube video with info](https://www.youtube.com/watch?v=5871y4BAGqE")

What I want to do is check on B4A while parsing and before editing the URL, if it is inside or outside of an HTML tag. However, I fail to do so. Any help?

Here is my B4A code:

B4X:
Sub LinkParser(PostBody As String)
    Dim url As String
    Do While PostBody.Contains("https://")
        url=PostBody.SubString2(PostBody.IndexOf("https://"),PostBody.IndexOf2(" ",PostBody.IndexOf("https://")))
        'This is where I want to check if "url" is inside a tag (HTML or markdown).
        'If it is, do nothing, it will be parsed by markdown parser.
        'If it's not, then proceed with the following commands I have truncated to save space
    Loop
   
    Log(url)
End Sub

Thank you very much for your time
 

Ohanian

Active Member
Licensed User
Longtime User
Hi,

with this regex you'll get the video id :

B4X:
    Dim s1, s2 As String
   
    s1 = "<iframe src='https://www.youtube.com/watch?v=5871y4BAGqE' width='50%' frameborder='0' allowfullscreen=''></iframe>"
    s2 = "This is a B4A Question. <a href='https://www.youtube.com/watch?v=5871y4BAGqE'>Here is a youtube video with info</a>"
       
    Dim m As Matcher = Regex.Matcher("http(?:s?):\/\/(?:www\.)?youtu(?:be\.com\/watch\?v=|\.be\/)([\w\-\_]*)(&(amp;)?‌[\w\?‌=]*)?", s1)
    If m.Find Then
        Log(m.Group(1))
    End If
 
Upvote 0

dimp

New Member
Licensed User
Longtime User
Thank you very much for your reply, but this is not what I need. Perhaps I phrased it wrong and confused you.

The only time I want to get the link (and replace it) is the first example, where it is just the URL with no tags or anything else, and then replace it. My problem is the check:

If it is just the link, it's a matter of .replace (or something similar) with the embed code. However if it is inside <a href="..."></a> or any other tag (HTML or markdown) I don't want to touch it.

So how can I check if it is inside a tag or just the link? Check the characters before and after? Or is it this "hack" that will perform badly?
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
Or is it this "hack" that will perform badly?
It depends on the possible inputs. If you can find the good links based on the character before the link then this is a proper solution.

If the text comes from a html page then you can use jTidy to convert it to XML and then parse it with Map2XML.
 
Upvote 0

MarkusR

Well-Known Member
Licensed User
Longtime User
i would parse the start < end > and i think your markdown is outside the html in the text area only, so you need to find the start and end of http link
and replace it with your html syntax.
if you need to find only the markdown, you can replace the ="https:// with ="xxxxx:// so it did not match your search/replace criterion, but i think
in the end you will have a intact web view.

long long ago i made a html parser, the web sites are so different that reg ex did not work.
example, your markdown link can end with a space or line break or other char.
space can a valid char in url but it is not showed as %20
people can write HTTP Http Htt Https HTTPs http// ... a good parser can correct some errors
typically for web sites are also that less than or greater than did not have the same count and a char was forgotten.
 
Last edited:
Upvote 0

dimp

New Member
Licensed User
Longtime User
Thank you all for your answers. I finally managed to do it with the markdown library I am using, and it happens at runtime on the webview (showdownjs is the library)
 
Upvote 0
Top