Android Question Access denied when downloading a website as text.

Filippo

Expert
Licensed User
Longtime User
Hi,

I am trying to download a website with my app, it always worked until recently.
Now this website refuses me access and I get this error message:
ResponseError. Reason: , Response: <HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>

You don't have permission to access "http&#58;&#47;&#47;www&#46;finanzen&#46;net&#47;suchergebnis&#46;asp&#63;" on this server.<P>
Reference&#32;&#35;18&#46;c6011002&#46;1702559562&#46;75b01d0
</BODY>
</HTML>

Here is my code:
B4X:
    Dim strUrl As String
    strUrl = https://www.finanzen.net/suchergebnis.asp?frmAktienSucheTextfeld=DE0005313704"

    getWebSeiteAlsString(strUrl)


Sub getWebSeiteAlsString(sURL As String)
    Dim job As HttpJob
    job.Initialize("WebSeiteAlsString", Me)
    job.Download(sURL)
    ProgressDialogShow2("Bitte warten...", True)
End Sub

Sub JobDone(Job As HttpJob)
     Dim parser As JSONParser
    Dim res As String
    
'    Log("JobName = " & Job.JobName & ", Success = " & Job.Success)
    
    If Job.Success Then
        res = Job.GetString
        parser.Initialize(res)
        
        Select Job.JobName
            Case "WebSeiteAlsString"
                ...               
        End Select
    Else
        MsgboxAsync("Die Charts können nicht angezeigt werden.","Kein Internet verbindung!")
    End If
    Job.Release
    
    ProgressDialogHide
End Sub

Can this block be lifted? If yes, how?
 

Sandman

Expert
Licensed User
Longtime User
It's a simple case of blocking based on some data from the client. Probably user agent, or something like that.

Doesn't work, just as you posted:
Bash:
sandman@mothership:~ curl "https://www.finanzen.net/suchergebnis.asp?frmAktienSucheTextfeld=DE0005313704"
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
 
You don't have permission to access "http&#58;&#47;&#47;www&#46;finanzen&#46;net&#47;suchergebnis&#46;asp&#63;" on this server.<P>
Reference&#32;&#35;18&#46;9f034917&#46;1702562512&#46;ad377c1
</BODY>
</HTML>
sandman@mothership:~

If I get the page in Firefox and copy the actual curl request from within the browser instead, it works just file:
Bash:
sandman@mothership:~ curl 'https://www.finanzen.net/aktien/carl_zeiss_meditec-aktie' --compressed -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'DNT: 1' -H 'Sec-GPC: 1' -H 'Connection: keep-alive' -H 'Cookie: at_check=true; mbox=session#30e9195ff95d42549383dfdd023a471c#1702564179; _sp_v1_ss=1:H4sIAAAAAAAAAItWqo5RKimOUbKKxs_IAzEMamN1YpRSQcy80pwcILsErKC6lpoSSrEA-EAOLpYAAAA%3D; _sp_v1_p=505; _sp_v1_data=686534; _sp_su=false; googleanalytics_consent=active=true; fintargeting_consent=active=true; jwplayer_consent=active=true; euconsent-v2=CP2xrcAP2xrcAAGABCENAdEgAP_gAEAAACQgJFBR5DrFDGFBMHBaYJEAKYgWVFgAQEQgAAAAAQABAAGAcAQCw2AiIASABCAAAQAAgAABAAAECAEEAAAAAAAEAAAAAAAAgAAIIABAABEAAgIQAAoAAAAAEAAAAAABAAAAmAAQAALAAAQAQAAQAAAAACAAAAAAAAAAAAAAAAIAAAAAAAAAAAAAAAIAAAAAAQAAAAABBDmA_AAoACwAKgAcABAACKAE4AUAAyABoAEQAJgATwA3gBzAEQAJwAfoBKQC5gGKANwAlYBLQCdgFDgLzAX8AxkBjgDIQG6gQ5ARBAAQF_BIBYAVQA_ACGAEcAPwAigBGgCSgJEAYMBIoKAIAAUACKAE4AUABzAS0Av4BjIDHAgAUADYAPgBCAEcAJ2KAAgEcGAAQCODoDgACwAKgAcABAAEQAJgAVQAxABvAD9AIYAiABOAD8AIoAR0AkoBKQCxAFzAMUAbgBF4CRAE7AKHAXmBDkCRQ4AiABcAGQANAAngCEAEcAP0AhABEQCLAEZAI4ATsBKwDBgGQgN1LQAQBHFgAIBHAwAQAEQBsgENgJaIQCgAFgBMACqAGIAN4AjgCKAEpAMUBIogAFAIyARwAsQBcwGeEoB4ACwAOABEACYAFUAMUAhgCIAEcAPwAuYBigEXgJEAXmBIokAGAAuAIQAjIBHAErAM8KQFwAFgAVAA4ACAAIgATAAqgBiAD9AIYAiAB-AEdAJKASkAuYBuAEXgJEATsAocBeYEOQJFFAB4ACgALgAyABoAE8AQgAjgBOAD9AIsARwAsQBigGeAN1AA.YAAAAAAAAAAA; consentUUID=5af7ba19-e98a-4421-8f38-b8fd3e6fe64a_26; gpt_ppid50=eM3MpclsV8LrAB4NVhn1NJcQTtUICaaHjLMCEp7p0nUeXR53bs' -H 'Upgrade-Insecure-Requests: 1' -H 'Sec-Fetch-Dest: document' -H 'Sec-Fetch-Mode: navigate' -H 'Sec-Fetch-Site: none' -H 'Sec-Fetch-User: ?1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache'
...the page html removed here...
sandman@mothership:~

Next step for you would be to start stripping down the curl command to see how much you can remove before getting an error. When you've reached the bare minimum you know what to impersonate in your B4X code.
 
Upvote 1

DonManfred

Expert
Licensed User
Longtime User
Can this block be lifted?
Contact the website author/admin and ask to unblock you.

Maybe try to add a customized Header to "simulate" being the request from a Firefox-browser

B4X:
Dim j As HttpJob
j.Initialize("job name", Me)
j.Download(<link>) 'it can also be PostString or any of the other methods
j.GetRequest.SetHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")
 
Upvote 2

Sandman

Expert
Licensed User
Longtime User
Quick follow-up. Just as I expected, you just need to set a user-agent that they can accept. Here's the one from my example above. That's all that's needed to get the html.
Bash:
curl 'https://www.finanzen.net/aktien/carl_zeiss_meditec-aktie' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0'
 
Upvote 0

Filippo

Expert
Licensed User
Longtime User
Contact the website author/admin and ask to unblock you.

Maybe try to add a customized Header to "simulate" being the request from a Firefox-browser

B4X:
Dim j As HttpJob
j.Initialize("job name", Me)
j.Download(<link>) 'it can also be PostString or any of the other methods
j.GetRequest.SetHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")
Thank you! It works perfectly.

@Sandman
Many thanks!
 
Upvote 0

Filippo

Expert
Licensed User
Longtime User
Contact the website author/admin and ask to unblock you.

Maybe try to add a customized Header to "simulate" being the request from a Firefox-browser

B4X:
Dim j As HttpJob
j.Initialize("job name", Me)
j.Download(<link>) 'it can also be PostString or any of the other methods
j.GetRequest.SetHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")
Too bad, it worked until a few days ago.

Is there perhaps another possibility or a trick?
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
Too bad, it worked until a few days ago.
Probably they did changed the System behind. Adding a new security-layer or something.
How many requests are you doing per minute,hour, day? Maybe you are doing it to much often for them to block you.

Contact finanzen.net and ask for an Api which you then can use.
 
Upvote 0

peacemaker

Expert
Licensed User
Longtime User
Just tracked your requests, often ones, or long-period ones from the fixed IP address. It's standard defence of the web-sites against the grabber apps.
Try to change the user-agent after some requests batch.
Proxy servers are for such tasks.
 
Upvote 0

Sandman

Expert
Licensed User
Longtime User
Is there perhaps another possibility or a trick?
Shouldn't be needed, this continues to work just fine:
Quick follow-up. Just as I expected, you just need to set a user-agent that they can accept. Here's the one from my example above. That's all that's needed to get the html.
Bash:
curl 'https://www.finanzen.net/aktien/carl_zeiss_meditec-aktie' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0'

Either you're not mimicking the curl request well enough, or you're hammering their server so much that they blocked you.
 
Upvote 0

Filippo

Expert
Licensed User
Longtime User
Firefox/120.0'
Perfect! Now it works again, thank you!

Either you're not mimicking the curl request well enough, or you're hammering their server so much that they blocked you.
The request is only sent by a single app (my private app), maybe 1-2 per week. That should not be the problem.

Contact finanzen.net and ask for an Api which you then can use.
I know the site has an API, but it's not free, and for the small number of requests I send, it's not worth it.
 
Upvote 0
Top