Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redirects? #3390

Closed
jrmlhermitte opened this issue Nov 15, 2018 · 8 comments
Closed

redirects? #3390

jrmlhermitte opened this issue Nov 15, 2018 · 8 comments
Labels
bug documentation Improvements or additions to documentation outdated

Comments

@jrmlhermitte
Copy link

jrmlhermitte commented Nov 15, 2018

Long story short

I am trying to get url resolution after redirects to work with aiohttp.
Without, it is simple:

import requests

resolved_url = requests.get(url).url

However, when I try to do the same sort of thing with aiohttp, the url I am getting is not the resolved url after redirects. Here is the code:

import asyncio
from aiohttp import ClientSession

from charybdis.utils.web_utils import resolve_url
async def fetch(url, session):
    async with session.get(url, allow_redirects=True) as response:
        return response.url

async def resolve_urls(urls):
    tasks = []
    async with ClientSession() as session:
        for url in urls:
            task = asyncio.ensure_future(fetch(url, session))
            tasks.append(task)

        urls = await asyncio.gather(*tasks)
        # you now have all response bodies in this variable
        print(urls)
        return urls


loop = asyncio.get_event_loop()
future = asyncio.ensure_future(resolve_urls(urls))
loop.run_until_complete(future)

Is this a bug or intended?

@aio-libs-bot
Copy link

GitMate.io thinks the contributor most likely able to help you is @asvetlov.

Possibly related issues are #1146 (Redirects for HEAD requests should use HEAD), #3315 (ValueError when redirecting to URL with incorrect schema), #246 (Endless redirects when behind a proxy/firewall), #2009 (RuntimeError raised when a Redirect doesn't have a Location or URI HTTP header), and #2022 ([enhancement] ability to return redirect responses which don't have 'location' header).

@aio-libs-bot aio-libs-bot added bug documentation Improvements or additions to documentation labels Nov 15, 2018
@asvetlov
Copy link
Member

Please provide the following info:

  1. Requested url
  2. response.status
  3. response.url
  4. response.history

@jrmlhermitte
Copy link
Author

jrmlhermitte commented Nov 15, 2018

Running this:

from aiohttp import ClientSession


url = 'https://news.google.com/articles/CBMiW2h0dHBzOi8vd3d3LnZhbml0eWZhaXIuY29tL3N0eWxlLzIwMTgvMTEvZG9uYWxkLXRydW1wLWtpbS1rYXJkYXNoaWFuLWthbnllLXdlc3Qtb3ZhbC1vZmZpY2XSAaoBaHR0cHM6Ly93d3ctdmFuaXR5ZmFpci1jb20uY2RuLmFtcHByb2plY3Qub3JnL3Yvcy93d3cudmFuaXR5ZmFpci5jb20vc3R5bGUvMjAxOC8xMS9kb25hbGQtdHJ1bXAta2ltLWthcmRhc2hpYW4ta2FueWUtd2VzdC1vdmFsLW9mZmljZS9hbXA_YW1wX2pzX3Y9MC4xI3dlYnZpZXc9MSZjYXA9c3dpcGU?hl=en-US&gl=US&ceid=US%3Aen'


# should be:

resolved_url = 'https://www.vanityfair.com/style/2018/11/donald-trump-kim-kardashian-kanye-west-oval-office'

from asgiref.sync import async_to_sync

@async_to_sync
async def fetch(url):
    async with ClientSession() as session:
        async with session.get(url, allow_redirects=True) as response:
            print('url: ', response.url)
            print('status: ', response.status)
            print('history: ', response.history)

print('requested: ', url)
fetch(url)
print('should be: ', resolved_url)

I get this:

requested:  https://news.google.com/articles/CBMiW2h0dHBzOi8vd3d3LnZhbml0eWZhaXIuY29tL3N0eWxlLzIwMTgvMTEvZG9uYWxkLXRydW1wLWtpbS1rYXJkYXNoaWFuLWthbnllLXdlc3Qtb3ZhbC1vZmZpY2XSAaoBaHR0cHM6Ly93d3ctdmFuaXR5ZmFpci1jb20uY2RuLmFtcHByb2plY3Qub3JnL3Yvcy93d3cudmFuaXR5ZmFpci5jb20vc3R5bGUvMjAxOC8xMS9kb25hbGQtdHJ1bXAta2ltLWthcmRhc2hpYW4ta2FueWUtd2VzdC1vdmFsLW9mZmljZS9hbXA_YW1wX2pzX3Y9MC4xI3dlYnZpZXc9MSZjYXA9c3dpcGU?hl=en-US&gl=US&ceid=US%3Aen
url: https://news.google.com/articles/CBMiW2h0dHBzOi8vd3d3LnZhbml0eWZhaXIuY29tL3N0eWxlLzIwMTgvMTEvZG9uYWxkLXRydW1wLWtpbS1rYXJkYXNoaWFuLWthbnllLXdlc3Qtb3ZhbC1vZmZpY2XSAaoBaHR0cHM6Ly93d3ctdmFuaXR5ZmFpci1jb20uY2RuLmFtcHByb2plY3Qub3JnL3Yvcy93d3cudmFuaXR5ZmFpci5jb20vc3R5bGUvMjAxOC8xMS9kb25hbGQtdHJ1bXAta2ltLWthcmRhc2hpYW4ta2FueWUtd2VzdC1vdmFsLW9mZmljZS9hbXA_YW1wX2pzX3Y9MC4xI3dlYnZpZXc9MSZjYXA9c3dpcGU?hl=en-US&gl=US&ceid=US:en
status:  200
history:  ()
should be:  https://www.vanityfair.com/style/2018/11/donald-trump-kim-kardashian-kanye-west-oval-office

@jrmlhermitte
Copy link
Author

And using the requests module:

In [1]: import requests

In [2]: result = requests.get('https://news.google.com/articles/CBMiW2h0dHBzOi8vd3d3LnZhbml0eWZhaXIuY29tL3N0eWx
   ...: lLzIwMTgvMTEvZG9uYWxkLXRydW1wLWtpbS1rYXJkYXNoaWFuLWthbnllLXdlc3Qtb3ZhbC1vZmZpY2XSAaoBaHR0cHM6Ly93d3ctdm
   ...: FuaXR5ZmFpci1jb20uY2RuLmFtcHByb2plY3Qub3JnL3Yvcy93d3cudmFuaXR5ZmFpci5jb20vc3R5bGUvMjAxOC8xMS9kb25hbGQtd
   ...: HJ1bXAta2ltLWthcmRhc2hpYW4ta2FueWUtd2VzdC1vdmFsLW9mZmljZS9hbXA_YW1wX2pzX3Y9MC4xI3dlYnZpZXc9MSZjYXA9c3dp
   ...: cGU?hl=en-US&gl=US&ceid=US%3Aen')

In [3]: result.url
Out[3]: 'https://www.vanityfair.com/style/2018/11/donald-trump-kim-kardashian-kanye-west-oval-office'

@mnacharov
Copy link
Contributor

Look's like news.google.com respond redirect for User-Agent='python-requests/2.20.0' and human readable result for User-Agent='Python/3.5 aiohttp/3.5.0a0'

@mnacharov
Copy link
Contributor

This example works fine:

import asyncio
from aiohttp import ClientSession


url = 'https://news.google.com/articles/CBMiW2h0dHBzOi8vd3d3LnZhbml0eWZhaXIuY29tL3N0eWxlLzIwMTgvMTEvZG9uYWxkLXRydW1wLWtpbS1rYXJkYXNoaWFuLWthbnllLXdlc3Qtb3ZhbC1vZmZpY2XSAaoBaHR0cHM6Ly93d3ctdmFuaXR5ZmFpci1jb20uY2RuLmFtcHByb2plY3Qub3JnL3Yvcy93d3cudmFuaXR5ZmFpci5jb20vc3R5bGUvMjAxOC8xMS9kb25hbGQtdHJ1bXAta2ltLWthcmRhc2hpYW4ta2FueWUtd2VzdC1vdmFsLW9mZmljZS9hbXA_YW1wX2pzX3Y9MC4xI3dlYnZpZXc9MSZjYXA9c3dpcGU?hl=en-US&gl=US&ceid=US%3Aen'

# should be:
resolved_url = 'https://www.vanityfair.com/style/2018/11/donald-trump-kim-kardashian-kanye-west-oval-office'


async def fetch(url):
    async with ClientSession() as session:
        async with session.get(url, allow_redirects=True, headers={'User-Agent': 'python-requests/2.20.0'}) as response:
            print('url: ', response.url)
            print('status: ', response.status)
            print('history: ', response.history)


print('requested: ', url)
asyncio.get_event_loop().run_until_complete(fetch(url))
print('should be: ', resolved_url)

# requested:  https://news.google.com/articles/CBMiW2h0dHBzOi8vd3d3LnZhbml0eWZhaXIuY29tL3N0eWxlLzIwMTgvMTEvZG9uYWxkLXRydW1wLWtpbS1rYXJkYXNoaWFuLWthbnllLXdlc3Qtb3ZhbC1vZmZpY2XSAaoBaHR0cHM6Ly93d3ctdmFuaXR5ZmFpci1jb20uY2RuLmFtcHByb2plY3Qub3JnL3Yvcy93d3cudmFuaXR5ZmFpci5jb20vc3R5bGUvMjAxOC8xMS9kb25hbGQtdHJ1bXAta2ltLWthcmRhc2hpYW4ta2FueWUtd2VzdC1vdmFsLW9mZmljZS9hbXA_YW1wX2pzX3Y9MC4xI3dlYnZpZXc9MSZjYXA9c3dpcGU?hl=en-US&gl=US&ceid=US%3Aen
# url:  https://www.vanityfair.com/style/2018/11/donald-trump-kim-kardashian-kanye-west-oval-office
# status:  200
# history:  (<ClientResponse(https://news.google.com/articles/CBMiW2h0dHBzOi8vd3d3LnZhbml0eWZhaXIuY29tL3N0eWxlLzIwMTgvMTEvZG9uYWxkLXRydW1wLWtpbS1rYXJkYXNoaWFuLWthbnllLXdlc3Qtb3ZhbC1vZmZpY2XSAaoBaHR0cHM6Ly93d3ctdmFuaXR5ZmFpci1jb20uY2RuLmFtcHByb2plY3Qub3JnL3Yvcy93d3cudmFuaXR5ZmFpci5jb20vc3R5bGUvMjAxOC8xMS9kb25hbGQtdHJ1bXAta2ltLWthcmRhc2hpYW4ta2FueWUtd2VzdC1vdmFsLW9mZmljZS9hbXA_YW1wX2pzX3Y9MC4xI3dlYnZpZXc9MSZjYXA9c3dpcGU?hl=en-US&gl=US&ceid=US:en) [301 Moved Permanently]>
#  <CIMultiDictProxy('Content-Type': 'application/binary', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate', 'Pragma': 'no-cache', 'Expires': 'Mon, 01 Jan 1990 00:00:00 GMT', 'Date': 'Wed, 21 Nov 2018 12:07:38 GMT', 'Location': 'https://www.vanityfair.com/style/2018/11/donald-trump-kim-kardashian-kanye-west-oval-office', 'P3P': 'CP="This is not a P3P policy! See g.co/p3phelp for more info."', 'Strict-Transport-Security': 'max-age=31536000', 'Content-Security-Policy': "script-src 'nonce-aPJRqh/w2MMAdvMW1hsveZEcDl8' 'unsafe-inline' 'unsafe-eval';object-src 'none';base-uri 'self';report-uri /_/DotsSplashUi/cspreport;worker-src 'self'", 'Server': 'ESF', 'Content-Length': '0', 'X-XSS-Protection': '1; mode=block', 'X-Frame-Options': 'SAMEORIGIN', 'X-Content-Type-Options': 'nosniff', 'Set-Cookie': 'NID=146=PHOs_td7Rwp2S2uSKJRtkVzIttMq_V4MI6n_Vm5bz-KdTuOdIMpVtbbZ2VyW0BHJ_tv8DCQTpOBXCjAPv0v1sfFwiBSVlDXKcv_umhT8eIy8ep5E8MqJ01Pmiik52SxvsKpMaciukumpqLeKICFTH9IB7XA2GQOdoQUfNvMZRbw;Domain=.google.com;Path=/;Expires=Thu, 23-May-2019 12:07:38 GMT;HttpOnly', 'Alt-Svc': 'quic=":443"; ma=2592000; v="44,43,39,35"')>
# ,)
# should be:  https://www.vanityfair.com/style/2018/11/donald-trump-kim-kardashian-kanye-west-oval-office

@asvetlov
Copy link
Member

@mnach very big thanks for your investigation.
Closing the issue.

@lock
Copy link

lock bot commented Nov 21, 2019

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Nov 21, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Nov 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug documentation Improvements or additions to documentation outdated
Projects
None yet
Development

No branches or pull requests

4 participants