Python 请求 - 管理 cookie

Question

提问by Jay Gattuso

I'm trying to get some content automatically from a site using requests (and bs4)

我正在尝试使用请求（和 bs4）从站点自动获取一些内容

I have a script that gets a cookie:

我有一个获取 cookie 的脚本：

def getCookies(self):
    username = 'username'
    password = 'password'
    URL = 'logonURL'
    r = requests.get(URL, auth=('username', 'password'))
    cookies = r.cookies

dump of the cookies looks like:

cookie 的转储看起来像：

<<class 'requests.cookies.RequestsCookieJar'>[<Cookie ASP.NET_SessionId=yqokjr55ezarqbijyrwnov45 for URL.com/>, <Cookie BIGipServerPE_Journals.lww.com_80=1440336906.20480.0000 for URL.com/>, <Cookie JournalsLockCookie=id=a5720750-3f20-4207-a500-93ae4389213c&ip=IP address for URL.com/>]>

But when I pass the cookie object to the next URL:

但是当我将 cookie 对象传递给下一个 URL 时：

 soup = Soup(s.get(URL, cookies = cookies).content)

its not working out - I can see by dumping the soup that I'm not giving the webserver my credentials properly

它不起作用 - 我可以通过倾倒汤看到我没有正确地向网络服务器提供我的凭据

I tried running a requests session:

我尝试运行请求会话：

def getCookies(self):
    self.s = requests.session()
    username = 'username'
    password = 'password'
    URL = 'logURL'
    r = self.s.get(URL, auth=('username', 'password'))

and I get the same no joy.

和我一样没有快乐。

I looked at the header via liveHttp in FF when I visit the 2nd page, and see a very different form:

当我访问第二页时，我通过 FF 中的 liveHttp 查看了标题，并看到了一个非常不同的形式：

Cookie: WT_FPC=id=264b0aa85e0247eb4f11355304127862:lv=1355317068013:ss=1355314918680; UserInfo=Username=username; BIGipServerPE_Journals.lww.com_80=1423559690.20480.0000; PlatformAuthCookie=true; Institution=ReferrerUrl=http://logonURL.com/?wa=wsignin1.0&wtrealm=urn:adis&wctx=http://URL.com/_layouts/Authenticate.aspx?Source=%252fpecnews%252ftoc%252f2012%252f06440&token=method|ExpireAbsolute; counterSessionGuidId=6e2bd57f-b6da-4dd4-bcb0-742428e08b5e; MyListsRefresh=12/13/2012 12:59:04 AM; ASP.NET_SessionId=40a04p45zppozc45wbadah45; JournalsLockCookie=id=85d1f38f-dcbb-476a-bc2e-92f7ac1ae493&ip=10.204.217.84; FedAuth=77u/PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz48U2VjdXJpdHlDb250ZXh0VG9rZW4gcDE6SWQ9Il9mMGU5N2M3Zi1jNzQ5LTQ4ZjktYTUxNS1mODNlYjJiNGNlYzUtNEU1MDQzOEY0RTk5QURCNDFBQTA0Mjc0RDE5QzREMEEiIHhtbG5zOnAxPSJodHRwOi8vZG9jcy5vYXNpcy1vcGVuLm9yZy93c3MvMjAwNC8wMS9vYXNpcy0yMDA0MDEtd3NzLXdzc2VjdXJpdHktdXRpbGl0eS0xLjAueHNkIiB4bWxucz0iaHR0cDovL2RvY3Mub2FzaXMtb3Blbi5vcmcvd3Mtc3gvd3Mtc2VjdXJlY29udmVyc2F0aW9uLzIwMDUxMiI+PElkZW50aWZpZXI+dXJuOnV1aWQ6ZjJmNGY5MGItMmE4Yy00OTdlLTkwNzktY2EwYjM3MTBkN2I1PC9JZGVudGlmaWVyPjxJbnN0YW5jZT51cm46dXVpZDo2NzMxN2U5Ny1lMWQ3LTQ2YzUtOTg2OC05ZGJhYjA3NDkzOWY8L0luc3RhbmNlPjwvU2VjdXJpdHlDb250ZXh0VG9rZW4+

I have redacted the username, password, and URLS from the question for obvious reasons.

出于显而易见的原因，我已从问题中编辑了用户名、密码和 URL。

Am I missing something obvious? is there a different / proper way to capture the cookie - the current method I'm using is not working.

我错过了一些明显的东西吗？是否有不同/正确的方法来捕获 cookie - 我正在使用的当前方法不起作用。

EDIT:

编辑：

This is a self standing version of the sessioned code:

这是会话代码的独立版本：

s = requests.session()
username = 'username'
password = 'password'
URL = 'logonURL.aspx'
r = s.get(URL, auth=('username', 'password'))
URL = r"URL.aspx"
soup = Soup(s.get(URL).content)

reading a dump of the soup, I can see in the html that its telling me I don't have access - this string only appears via browser when you're not logged in.

读取汤的转储，我可以在 html 中看到它告诉我我没有访问权限 - 该字符串仅在您未登录时通过浏览器出现。

Answer 1

回答by Martijn Pieters

You should be reusing the whole sessionobject, not the associated cookiejar. Use self.sfor all requests you make.

您应该重用整个会话对象，而不是关联的 cookiejar。使用self.s为你做的所有请求。

If your requests are still failing when reusing the session, they will be failing for a different reason, not because you are not properly returning cookies.

如果您的请求在重用会话时仍然失败，它们将由于不同的原因而失败，而不是因为您没有正确返回 cookie。

Note that if you need to use auth=('username', 'password')then the authentication is HTTPAuth-based, not cookie-based. You need to pass in the same authentication for all calls. The requests session can do that for you too:

请注意，如果您需要使用，auth=('username', 'password')那么身份验证是基于 HTTPAuth 的，而不是基于 cookie 的。您需要为所有调用传递相同的身份验证。requests 会话也可以为您做到这一点：

s = requests.session(auth=('username', 'password'))

If, however, the login page is a form with a username and password field, you'll need to call the form target instead. Check if the form is POST or GET, and check the fieldnames:

但是，如果登录页面是带有用户名和密码字段的表单，则您需要调用表单目标。检查表单是 POST 还是 GET，并检查字段名：

s.post(loginTarget, {usernamefield=username, passwordfield=password, otherfield=othervalue})

and not use HTTP authentication at all.

并且根本不使用 HTTP 身份验证。

Answer 2

回答by arhuaco

I had a similar problem and found help in this question. The session jar was empty and to actually get the cookie I needed to use a session.

我有一个类似的问题，并在这个问题中找到了帮助。会话 jar 是空的，为了实际获取我需要使用会话的 cookie。

session = requests.session()
p = session.post("http://example.com", {'user':user,'password':password})
print 'headers', p.headers
print 'cookies', requests.utils.dict_from_cookiejar(session.cookies)
print 'html',  p.text

Python 请求 - 管理 cookie

提问by Jay Gattuso

回答by Martijn Pieters

回答by arhuaco

相关推荐

最近更新

标签

Python 请求 - 管理 cookie

提问by Jay Gattuso

回答by Martijn Pieters

回答by arhuaco

相关推荐

Python Numpy 错误：奇异矩阵

Python 3 中的 sys.maxint 是什么？

Python 一次写入多个文件

Python 无法使用子目录中的 ConfigParser 加载相关配置文件

相关推荐

最近更新

标签