Android Question Access denied when downloading a website as text.

Filippo · Dec 14, 2023

Hi,

I am trying to download a website with my app, it always worked until recently.
Now this website refuses me access and I get this error message:

ResponseError. Reason: , Response: <HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>

You don't have permission to access "http://www.finanzen.net/suchergebnis.asp?" on this server.<P>
Reference #18.c6011002.1702559562.75b01d0
</BODY>
</HTML>

Here is my code:

B4X:

    Dim strUrl As String
    strUrl = https://www.finanzen.net/suchergebnis.asp?frmAktienSucheTextfeld=DE0005313704"

    getWebSeiteAlsString(strUrl)


Sub getWebSeiteAlsString(sURL As String)
    Dim job As HttpJob
    job.Initialize("WebSeiteAlsString", Me)
    job.Download(sURL)
    ProgressDialogShow2("Bitte warten...", True)
End Sub

Sub JobDone(Job As HttpJob)
     Dim parser As JSONParser
    Dim res As String
    
'    Log("JobName = " & Job.JobName & ", Success = " & Job.Success)
    
    If Job.Success Then
        res = Job.GetString
        parser.Initialize(res)
        
        Select Job.JobName
            Case "WebSeiteAlsString"
                ...               
        End Select
    Else
        MsgboxAsync("Die Charts können nicht angezeigt werden.","Kein Internet verbindung!")
    End If
    Job.Release
    
    ProgressDialogHide
End Sub

Can this block be lifted? If yes, how?

aeric · Dec 14, 2023

Maybe you need to allow cookie or provide an API key?

By the way, why don't you use Wait For with OkHttpUtils2?

Filippo · Dec 14, 2023

aeric said:
Maybe you need to allow cookie or provide an API key?

This website, as far as I know, does not support Api-Key.

aeric said:
By the way, why don't you use Wait For with OkHttpUtils2?

Because it is basically the same.

Sandman · Dec 14, 2023

It's a simple case of blocking based on some data from the client. Probably user agent, or something like that.

Doesn't work, just as you posted:

Bash:

sandman@mothership:~ curl "https://www.finanzen.net/suchergebnis.asp?frmAktienSucheTextfeld=DE0005313704"
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
 
You don't have permission to access "http&#58;&#47;&#47;www&#46;finanzen&#46;net&#47;suchergebnis&#46;asp&#63;" on this server.<P>
Reference&#32;&#35;18&#46;9f034917&#46;1702562512&#46;ad377c1
</BODY>
</HTML>
sandman@mothership:~

If I get the page in Firefox and copy the actual curl request from within the browser instead, it works just file:

Bash:

sandman@mothership:~ curl 'https://www.finanzen.net/aktien/carl_zeiss_meditec-aktie' --compressed -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'DNT: 1' -H 'Sec-GPC: 1' -H 'Connection: keep-alive' -H 'Cookie: at_check=true; mbox=session#30e9195ff95d42549383dfdd023a471c#1702564179; _sp_v1_ss=1:H4sIAAAAAAAAAItWqo5RKimOUbKKxs_IAzEMamN1YpRSQcy80pwcILsErKC6lpoSSrEA-EAOLpYAAAA%3D; _sp_v1_p=505; _sp_v1_data=686534; _sp_su=false; googleanalytics_consent=active=true; fintargeting_consent=active=true; jwplayer_consent=active=true; euconsent-v2=CP2xrcAP2xrcAAGABCENAdEgAP_gAEAAACQgJFBR5DrFDGFBMHBaYJEAKYgWVFgAQEQgAAAAAQABAAGAcAQCw2AiIASABCAAAQAAgAABAAAECAEEAAAAAAAEAAAAAAAAgAAIIABAABEAAgIQAAoAAAAAEAAAAAABAAAAmAAQAALAAAQAQAAQAAAAACAAAAAAAAAAAAAAAAIAAAAAAAAAAAAAAAIAAAAAAQAAAAABBDmA_AAoACwAKgAcABAACKAE4AUAAyABoAEQAJgATwA3gBzAEQAJwAfoBKQC5gGKANwAlYBLQCdgFDgLzAX8AxkBjgDIQG6gQ5ARBAAQF_BIBYAVQA_ACGAEcAPwAigBGgCSgJEAYMBIoKAIAAUACKAE4AUABzAS0Av4BjIDHAgAUADYAPgBCAEcAJ2KAAgEcGAAQCODoDgACwAKgAcABAAEQAJgAVQAxABvAD9AIYAiABOAD8AIoAR0AkoBKQCxAFzAMUAbgBF4CRAE7AKHAXmBDkCRQ4AiABcAGQANAAngCEAEcAP0AhABEQCLAEZAI4ATsBKwDBgGQgN1LQAQBHFgAIBHAwAQAEQBsgENgJaIQCgAFgBMACqAGIAN4AjgCKAEpAMUBIogAFAIyARwAsQBcwGeEoB4ACwAOABEACYAFUAMUAhgCIAEcAPwAuYBigEXgJEAXmBIokAGAAuAIQAjIBHAErAM8KQFwAFgAVAA4ACAAIgATAAqgBiAD9AIYAiAB-AEdAJKASkAuYBuAEXgJEATsAocBeYEOQJFFAB4ACgALgAyABoAE8AQgAjgBOAD9AIsARwAsQBigGeAN1AA.YAAAAAAAAAAA; consentUUID=5af7ba19-e98a-4421-8f38-b8fd3e6fe64a_26; gpt_ppid50=eM3MpclsV8LrAB4NVhn1NJcQTtUICaaHjLMCEp7p0nUeXR53bs' -H 'Upgrade-Insecure-Requests: 1' -H 'Sec-Fetch-Dest: document' -H 'Sec-Fetch-Mode: navigate' -H 'Sec-Fetch-Site: none' -H 'Sec-Fetch-User: ?1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache'
...the page html removed here...
sandman@mothership:~

Next step for you would be to start stripping down the curl command to see how much you can remove before getting an error. When you've reached the bare minimum you know what to impersonate in your B4X code.

DonManfred · Dec 14, 2023

Filippo said:
Can this block be lifted?

Contact the website author/admin and ask to unblock you.

Maybe try to add a customized Header to "simulate" being the request from a Firefox-browser

B4X:

Dim j As HttpJob
j.Initialize("job name", Me)
j.Download(<link>) 'it can also be PostString or any of the other methods
j.GetRequest.SetHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")

Sandman · Dec 14, 2023

Quick follow-up. Just as I expected, you just need to set a user-agent that they can accept. Here's the one from my example above. That's all that's needed to get the html.

Bash:

curl 'https://www.finanzen.net/aktien/carl_zeiss_meditec-aktie' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0'

Filippo · Dec 14, 2023

DonManfred said:
Contact the website author/admin and ask to unblock you.

Maybe try to add a customized Header to "simulate" being the request from a Firefox-browser

B4X:

Dim j As HttpJob j.Initialize("job name", Me) j.Download(<link>) 'it can also be PostString or any of the other methods j.GetRequest.SetHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")

Thank you! It works perfectly.

@Sandman
Many thanks!

Filippo · Apr 11, 2024

DonManfred said:
Contact the website author/admin and ask to unblock you.

Maybe try to add a customized Header to "simulate" being the request from a Firefox-browser

B4X:

Dim j As HttpJob j.Initialize("job name", Me) j.Download(<link>) 'it can also be PostString or any of the other methods j.GetRequest.SetHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")

Too bad, it worked until a few days ago.

Is there perhaps another possibility or a trick?

DonManfred · Apr 11, 2024

Filippo said:
Too bad, it worked until a few days ago.

Probably they did changed the System behind. Adding a new security-layer or something.
How many requests are you doing per minute,hour, day? Maybe you are doing it to much often for them to block you.

Contact finanzen.net and ask for an Api which you then can use.

peacemaker · Apr 11, 2024

Just tracked your requests, often ones, or long-period ones from the fixed IP address. It's standard defence of the web-sites against the grabber apps.
Try to change the user-agent after some requests batch.
Proxy servers are for such tasks.

Sandman · Apr 11, 2024

Filippo said:
Is there perhaps another possibility or a trick?

Shouldn't be needed, this continues to work just fine:

Sandman said:
Quick follow-up. Just as I expected, you just need to set a user-agent that they can accept. Here's the one from my example above. That's all that's needed to get the html.

Bash:

curl 'https://www.finanzen.net/aktien/carl_zeiss_meditec-aktie' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0'

Either you're not mimicking the curl request well enough, or you're hammering their server so much that they blocked you.

Filippo · Apr 11, 2024

Firefox/120.0'

Perfect! Now it works again, thank you!

Sandman said:
Either you're not mimicking the curl request well enough, or you're hammering their server so much that they blocked you.

The request is only sent by a single app (my private app), maybe 1-2 per week. That should not be the problem.

DonManfred said:
Contact finanzen.net and ask for an Api which you then can use.

I know the site has an API, but it's not free, and for the small number of requests I send, it's not worth it.

Android Question Access denied when downloading a website as text.

Filippo

Expert

aeric

Expert

Filippo

Expert

Sandman

Expert

DonManfred

Expert

Sandman

Expert

Filippo

Expert

Filippo

Expert

DonManfred

Expert

peacemaker

Expert

Sandman

Expert

Filippo

Expert

Similar Threads