Android Question [RegEx] Bug ?

Waldemar Lima

Well-Known Member
Licensed User
Guys, I'm trying to search for the m3u8 of lives on YouTube, however, I made a regex that only searches for links with .m3u8 at the end..
the regex finds something, however, it is completely incorrect...
the code snippet below, and what it finds...

Code>
B4X:
Dim j As HttpJob
        j.Initialize("", Me)
        j.Download("https://www.youtube.com/watch?v=floFotcxRIo")
        Wait For (j) JobDone(j As HttpJob)
        If j.Success Then
            'Log(j.GetString)
            Dim text, pattern As String
            text = j.GetString
            pattern = "https?.*?\.m3u8" 'one or more digits
            Dim Matcher1 As Matcher
            Matcher1 = Regex.Matcher2(pattern,Regex.CASE_INSENSITIVE, text)
            If Matcher1.Find Then  
                Log("Found: " & Matcher1.Match)
            End If
        End If
        j.Release

Console>
B4X:
*** Service (starter) Create ***
creating...
** Service (starter) Start **
** Activity (main) Create (first time) **
isrun = 0
Portrait
Call B4XPages.GetManager.LogEvents = True to enable logging B4XPages events.
** Activity (main) Resume **
*** Receiver (httputils2service) Receive (first time) ***
ID DO VIDEO = b_7Lp7-oN9s
Found: https:\/\/rr3---sn-n2xxqoxxucg-btoe.googlevideo.com\/generate_204');ytimg.preload('https:\/\/rr3---sn-n2xxqoxxucg-btoe.googlevideo.com\/generate_204?conn2');</script><link rel="canonical" href="https://www.youtube.com/watch?v=floFotcxRIo"><link rel="alternate" media="handheld" href="https://m.youtube.com/watch?v=floFotcxRIo"><link rel="alternate" media="only screen and (max-width: 640px)" href="https://m.youtube.com/watch?v=floFotcxRIo"><title>BARROSO CANTA &quot;EVIDÊNCIAS&quot; APÓS POSSE NA PRESIDÊNCIA DO STF - MORNING SHOW - 29/09/2023 - YouTube</title><meta name="title" content="BARROSO CANTA &quot;EVIDÊNCIAS&quot; APÓS POSSE NA PRESIDÊNCIA DO STF - MORNING SHOW - 29/09/2023"><meta name="description" content="Baixe o app Panflix: https://www.panflix.com.br/Baixe o AppNews Jovem Pan na Google Playhttps://bit.ly/2KRm8OJ Baixe o AppNews Jovem Pan na App Storehttps://..."><meta name="keywords" content="Felipeh Campos, Jovem Pan, Morning Show, Paulo Mathias, integramorningshow, Mano Ferreira, entretenimento, antonia fontenelle, morning show jovem pan, morning show jp, morning jp, morning jovem pan, Antônia Fontenelle, Guido Palomba, Entrevista Guido Palomba, Entrevista, Ovnis, Ets, Extraterrestres, Bolsonaro, Jair Bolsonaro, Lula, Luiz Inácio Lula da Silva"><link rel="shortlinkUrl" href="https://youtu.be/floFotcxRIo"><link rel="alternate" href="android-app://com.google.android.youtube/http/www.youtube.com/watch?v=floFotcxRIo"><link rel="alternate" href="ios-app://544007664/vnd.youtube/www.youtube.com/watch?v=floFotcxRIo"><link rel="alternate" type="application/json+oembed" href="https://www.youtube.com/oembed?format=json&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DfloFotcxRIo" title="BARROSO CANTA &quot;EVIDÊNCIAS&quot; APÓS POSSE NA PRESIDÊNCIA DO STF - MORNING SHOW - 29/09/2023"><link rel="alternate" type="text/xml+oembed" href="https://www.youtube.com/oembed?format=xml&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DfloFotcxRIo" title="BARROSO CANTA &quot;EVIDÊNCIAS&quot; APÓS POSSE NA PRESIDÊNCIA DO STF - MORNING SHOW - 29/09/2023"><link rel="image_src" href="https://i.ytimg.com/vi/floFotcxRIo/hqdefault.jpg"><meta property="og:site_name" content="YouTube"><meta property="og:url" content="https://www.youtube.com/watch?v=floFotcxRIo"><meta property="og:title" content="BARROSO CANTA &quot;EVIDÊNCIAS&quot; APÓS POSSE NA PRESIDÊNCIA DO STF - MORNING SHOW - 29/09/2023"><meta property="og:image" content="https://i.ytimg.com/vi/floFotcxRIo/hqdefault.jpg"><meta property="og:image:width" content="480"><meta property="og:image:height" content="360"><meta property="og:description" content="Baixe o app Panflix: https://www.panflix.com.br/Baixe o AppNews Jovem Pan na Google Playhttps://bit.ly/2KRm8OJ Baixe o AppNews Jovem Pan na App Storehttps://..."><meta property="al:ios:app_store_id" content="544007664"><meta property="al:ios:app_name" content="YouTube"><meta property="al:ios:url" content="vnd.youtube://www.youtube.com/watch?v=floFotcxRIo&amp;feature=applinks"><meta property="al:android:url" content="vnd.youtube://www.youtube.com/watch?v=floFotcxRIo&amp;feature=applinks"><meta property="al:web:url" content="http://www.youtube.com/watch?v=floFotcxRIo&amp;feature=applinks"><meta property="og:type" content="video.other"><meta property="og:video:url" content="https://www.youtube.com/embed/floFotcxRIo"><meta property="og:video:secure_url" content="https://www.youtube.com/embed/floFotcxRIo"><meta property="og:video:type" content="text/html"><meta property="og:video:width" content="1280"><meta property="og:video:height" content="720"><meta property="al:android:app_name" content="YouTube"><meta property="al:android:package" content="com.google.android.youtube"><meta property="og:video:tag" content="Felipeh Campos"><meta property="og:video:tag" content="Jovem Pan"><meta property="og:video:tag" content="Morning Show"><meta property="og:video:tag" content="Paulo Mathias"><meta property="og:video:tag" content="integramorningshow"><meta property="og:video:tag" content="Ma
Message longer than Log limit (4000). Message was truncated.


what is going on?? xD
 

teddybear

Well-Known Member
Licensed User
Message longer than Log limit (4000). Message was truncated.
It's not a bug.
Did you see the message in logs Message longer than Log limit (4000). Message was truncated

This is the real Matcher1.Match :https:..../sig/AOq0QJ8wRAIgLsl3m88hlm9-BjcSbfZDZT1bWJ2ZVJf6rX6-JfOtb2ACIGRyP7Fs0HgH717uixmER63ND1xzwL0Zlk0As4Eg0rgb/file/index.m3u8
 
Last edited:
Upvote 0

Waldemar Lima

Well-Known Member
Licensed User
https?.*?\.m3u8

I think you need to remove the second question mark.
It didn't work...

It's not a bug.
Did you see the message in logs Message longer than Log limit (4000). Message was truncated

This is the real Matcher1.Match :https:..../sig/AOq0QJ8wRAIgLsl3m88hlm9-BjcSbfZDZT1bWJ2ZVJf6rX6-JfOtb2ACIGRyP7Fs0HgH717uixmER63ND1xzwL0Zlk0As4Eg0rgb/file/index.m3u8
It didn't work...

look at the code below, I filtered only the 1,500 characters, and removed the second "?" and yet the result is the same..

B4X:
Dim j As HttpJob
        j.Initialize("", Me)
        j.Download("https://www.youtube.com/watch?v=1KIucmaYkmU")  '"&videoId)
        Wait For (j) JobDone(j As HttpJob)
        If j.Success Then
            'Log(j.GetString)
            Dim text, pattern As String
            text = j.GetString
            pattern = "https?.*\.m3u8" 'one or more digits
            Dim Matcher1 As Matcher
            Matcher1 = Regex.Matcher2(pattern,Regex.CASE_INSENSITIVE, text)
            If Matcher1.Find Then
                Dim livem3u8 As String = Matcher1.Match
                
                Log("Found: " & Matcher1.Match.SubString2(1,1500))
            End If
        End If
        j.Release
 
Upvote 0

epiCode

Active Member
Licensed User
pattern = "m3u8$" should suffice, if
1. you need to find all strings that end in m3u8
2. you are sure that no strings that end in m3u8 can be something that you do not want to match
3. all strings are http or https (so we do not have to match it since we know it will only be a url)
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
for this url: j.Download("")
and this pattern: pattern = $""(https://manifest.+?\.m3u8)""$
i get this: Found: https://manifest.googlevideo.com/api/manifest/dash/expire/1696040113/ei/UTAXZcalJIOH6QLI0Klg/ip/2607:fb90:3c11:7733:c8f6:c15c:aa6d:fe54/id/1KIucmaYkmU.12/source/yt_live_broadcast/requiressl/yes/tx/51018867/txs/51018864,51018865,51018866,51018867,51018868,51018869,51018870/as/fmp4_audio_clear,webm_audio_clear,webm2_audio_clear,fmp4_sd_hd_clear,webm2_sd_hd_clear/spc/UWF9f5CgrgeQMWOwU7f5auC3oGeFyHeR6AmC9GVJUQ/vprv/1/pacing/0/keepalive/yes/fexp/24007246/beids/24350017/itag/0/playlist_type/LIVE/sparams/expire,ei,ip,id,source,requiressl,tx,txs,as,spc,vprv,itag,playlist_type/sig/AOq0QJ8wRQIhAIlyBrcGT6VpzzcGK6LfwXHyVV07Gfo7RVq8QKlPEcbNAiBiqzm_8yp8W9PQBfHju7LxnOCC3QAdBZBlX1Yh49j5wQ==","hlsManifestUrl":"https://manifest.googlevideo.com/api/manifest/hls_variant/expire/1696040113/ei/UTAXZcalJIOH6QLI0Klg/ip/2607:fb90:3c11:7733:c8f6:c15c:aa6d:fe54/id/1KIucmaYkmU.12/source/yt_live_broadcast/requiressl/yes/tx/51018867/txs/51018864,51018865,51018866,51018867,51018868,51018869,51018870/hfr/1/playlist_duration/30/manifest_duration/30/maudio/1/spc/UWF9f5CgrgeQMWOwU7f5auC3oGeFyHeR6AmC9GVJUQ/vprv/1/go/1/pacing/0/nvgoi/1/keepalive/yes/fexp/24007246/beids/24350017/dover/11/itag/0/playlist_type/DVR/sparams/expire,ei,ip,id,source,requiressl,tx,txs,hfr,playlist_duration,manifest_duration,maudio,spc,vprv,go,itag,playlist_type/sig/AOq0QJ8wRQIgL_PVghlh-0iqITajyNiRnqlTexji04-DUW0vnmD8skMCIQDSHOgRTOy4GhF_b7kHWfDHfB6veq64mmR8eIGmEgHk2w==/file/index.m3u8

when i feed that into my explayer i see some cooking show with dancing music from brazil:
Dim ytlink As String =
"https://manifest.googlevideo.com/api/manifest/hls_variant/expire/1696038729/ei/6CoXZdakPN2By_sPotuT0AE/ip/2607:fb90:3c11:7733:c8f6:c15c:aa6d:fe54/id/1KIucmaYkmU.12/source/yt_live_broadcast/requiressl/yes/hfr/1/playlist_duration/30/manifest_duration/30/maudio/1/spc/UWF9f66jO2bzDUv0fdOeU-zeudP-twQFz6uSaKYToA/vprv/1/go/1/pacing/0/nvgoi/1/keepalive/yes/fexp/24007246/beids/24350017/dover/11/itag/0/playlist_type/DVR/sparams/expire,ei,ip,id,source,requiressl,hfr,playlist_duration,manifest_duration,maudio,spc,vprv,go,itag,playlist_type/sig/AOq0QJ8wRgIhAO2NgMwOgr_LjEPe21GXTF5-5z5BhtgPiygRIcQt6XGiAiEAig7fMs13N8GouEd6LbkcsS4fWG0sIsXo4_ypVBQYkRo=/file/index.m3u8"
player1.Prepare(player1.CreateHLSSource(ytlink)) ' NORMAL STREAMING
player1.Play


you understand that you always have to do the download. you cannot reliably save the .m3u8 file to play later. part of the url includes a timestamp.

the .m3u8 file is kept at a url containing the video's manifest. you need to look for that specific url

the result of this search is not > 4000 chars, so no worries about seeing completely in the log
 

Attachments

  • 1.png
    1.png
    75.1 KB · Views: 63
Last edited:
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
there are a few issues relating to how
google buries the link to a live video's
playlist file (.m3u8). attempts at
clever, micro-regex ops don't actually
work without a little help.

the attached is slightly off-topic since
this thread is about an alleged regex
bug, but it does extend the thread in
that it addresses the op's actual
problem: streaming live youtube
videos with exoplayer. of course, one
can alaready stream youtube live in a
webview without out regex, but after
wrestling with incessant ads and
other distracting material with youtube
in a webview, i can appreciate why
one would prefer using exoplayer.

with that in mind, below please find a
(very) simple app which takes as input
the live video's ID (just the ID is
needed). after finding the video's stream,
the app will launch exoplayer to stream it.

i found a site promoting live videos on youtube.
i tested the app using several of those listed.

there is a help page accessible from the menu.
 

Attachments

  • LiveYouTuber.zip
    10.4 KB · Views: 51
Upvote 0

MicroDrie

Well-Known Member
Licensed User
Have you tried it with the jSoup HTML Parser, because the many freedoms to build an HTML page are less suitable for a formal RegEx approach.
 
Upvote 0
Top