python 如何使用pycurl读取标题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/472179/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read the header with pycurl
提问by sverrejoh
How do I read the response headers returned from a PyCurl request?
如何读取从 PyCurl 请求返回的响应标头?
回答by bortzmeyer
There are several solutions (by default, they are dropped). Here is an example using the option HEADERFUNCTION which lets you indicate a function to handle them.
有几种解决方案(默认情况下,它们被删除)。这是一个使用选项 HEADERFUNCTION 的示例,它允许您指示处理它们的函数。
Other solutions are options WRITEHEADER (not compatible with WRITEFUNCTION) or setting HEADER to True so that they are transmitted with the body.
其他解决方案是选项 WRITEHEADER(与 WRITEFUNCTION 不兼容)或将 HEADER 设置为 True 以便它们与正文一起传输。
#!/usr/bin/python
import pycurl
import sys
class Storage:
def __init__(self):
self.contents = ''
self.line = 0
def store(self, buf):
self.line = self.line + 1
self.contents = "%s%i: %s" % (self.contents, self.line, buf)
def __str__(self):
return self.contents
retrieved_body = Storage()
retrieved_headers = Storage()
c = pycurl.Curl()
c.setopt(c.URL, 'http://www.demaziere.fr/eve/')
c.setopt(c.WRITEFUNCTION, retrieved_body.store)
c.setopt(c.HEADERFUNCTION, retrieved_headers.store)
c.perform()
c.close()
print retrieved_headers
print retrieved_body
回答by vontrapp
import pycurl
from StringIO import StringIO
headers = StringIO()
c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.HEADER, 1)
c.setopt(c.NOBODY, 1) # header only, no body
c.setopt(c.HEADERFUNCTION, headers.write)
c.perform()
print headers.getvalue()
Add any other curl setopts as necessary/desired, such as FOLLOWLOCATION.
根据需要/需要添加任何其他 curl setopt,例如 FOLLOWLOCATION。
回答by Alexandr
Anothr alternate, human_curl usage: pip human_curl
另一个替代,human_curl 用法:pip human_curl
In [1]: import human_curl as hurl
In [2]: r = hurl.get("http://stackoverflow.com")
In [3]: r.headers
Out[3]:
{'cache-control': 'public, max-age=45',
'content-length': '198515',
'content-type': 'text/html; charset=utf-8',
'date': 'Thu, 01 Sep 2011 11:53:43 GMT',
'expires': 'Thu, 01 Sep 2011 11:54:28 GMT',
'last-modified': 'Thu, 01 Sep 2011 11:53:28 GMT',
'vary': '*'}
回答by PEZ
This might or might not be an alternative for you:
这可能是也可能不是您的替代方案:
import urllib
headers = urllib.urlopen('http://www.pythonchallenge.com').headers.headers