Python - urllib2 和 cookielib
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4582964/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python - urllib2 & cookielib
提问by Adrian
I am trying to open the following website and retrieve the initial cookie and use it for the second url-open BUT if you run the following code it outputs 2 different cookies. How do I use the initial cookie for the second url-open?
我正在尝试打开以下网站并检索初始 cookie 并将其用于第二个 url-open 但如果您运行以下代码,它将输出 2 个不同的 cookie。如何将初始 cookie 用于第二个 url-open?
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
home = opener.open('https://www.idcourts.us/repository/start.do')
print cj
search = opener.open('https://www.idcourts.us/repository/partySearch.do')
print cj
Output shows 2 different cookies every time as you can see:
如您所见,输出每次都会显示 2 个不同的 cookie:
<cookielib.CookieJar[<Cookie JSESSIONID=0DEEE8331DE7D0DFDC22E860E065085F for www.idcourts.us/repository>]>
<cookielib.CookieJar[<Cookie JSESSIONID=E01C2BE8323632A32DA467F8A9B22A51 for www.idcourts.us/repository>]>
采纳答案by albertov
This is not a problem with urllib. That site does some funky stuff. You need to request a couple of stylesheets for it to validate your session id:
这不是 urllib 的问题。该网站做了一些时髦的事情。您需要为其请求几个样式表来验证您的会话 ID:
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# default User-Agent ('Python-urllib/2.6') will *not* work
opener.addheaders = [
('User-Agent', 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11'),
]
stylesheets = [
'https://www.idcourts.us/repository/css/id_style.css',
'https://www.idcourts.us/repository/css/id_print.css',
]
home = opener.open('https://www.idcourts.us/repository/start.do')
print cj
sessid = cj._cookies['www.idcourts.us']['/repository']['JSESSIONID'].value
# Note the +=
opener.addheaders += [
('Referer', 'https://www.idcourts.us/repository/start.do'),
]
for st in stylesheets:
# da trick
opener.open(st+';jsessionid='+sessid)
search = opener.open('https://www.idcourts.us/repository/partySearch.do')
print cj
# perhaps need to keep updating the referer...
回答by Senthil Kumaran
I think, it is a problem with the server it is Setting a new cookie for each request.
我认为,这是服务器的问题,它为每个请求设置一个新的 cookie。
回答by thirtydot
Not an actual answer (but far too long for a comment); possibly useful to anyone else trying to answer this.
不是一个实际的答案(但评论太长了);可能对试图回答这个问题的其他人有用。
Despite my best attempts, I can't figure this out.
尽管我尽了最大的努力,但我无法弄清楚这一点。
Looking in Firebug, the cookie seems to remain the same (works properly) for Firefox.
在 Firebug 中,Firefox 的 cookie 似乎保持不变(正常工作)。
I added urllib2.HTTPSHandler(debuglevel=1)to debug what headers Python is sending, and it does appear to resend the cookie.
我添加urllib2.HTTPSHandler(debuglevel=1)了调试 Python 发送的标头,它似乎确实重新发送了 cookie。
I also added all the Firefox request headers to see if that would help (it didn't):
我还添加了所有 Firefox 请求标头以查看是否有帮助(没有帮助):
opener.addheaders = [
('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13'),
..
]
My test code:
我的测试代码:
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), urllib2.HTTPSHandler(debuglevel=1))
opener.addheaders = [
('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13'),
('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
('Accept-Language', 'en-gb,en;q=0.5'),
('Accept-Encoding', 'gzip,deflate'),
('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'),
('Keep-Alive', '115'),
('Connection', 'keep-alive'),
('Cache-Control', 'max-age=0'),
('Referer', 'https://www.idcourts.us/repository/partySearch.do'),
]
home = opener.open('https://www.idcourts.us/repository/start.do')
print cj
search = opener.open('https://www.idcourts.us/repository/partySearch.do')
print cj
I feel like I'm missing something obvious.
我觉得我错过了一些明显的东西。

