浏览器模拟 - Python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2567738/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 00:56:05  来源:igfitidea点击:

Browser simulation - Python

pythonsessionbrowsercookies

提问by RadiantHex

I need to access a few HTML pages through a Python script, problem is that I need COOKIE functionality, therefore a simple urllib HTTP request won't work.

我需要通过 Python 脚本访问一些 HTML 页面,问题是我需要 COOKIE 功能,因此一个简单的 urllib HTTP 请求将不起作用。

Any ideas?

有任何想法吗?

回答by Corey Goldberg

check out Mechanize. "Stateful programmatic web browsing in Python".
It handles cookies automagically.

查看机械化。“Python 中的有状态程序化网页浏览”。
它会自动处理 cookie。

import mechanize

br = mechanize.Browser()
resp = br.open("http://www.mysitewithcookies.com/")
print resp.info()  # headers
print resp.read()  # content

mechanize also exposes the urllib2 API, with cookie handling enabled by default.

mechanize 还公开了 urllib2 API,默认情况下启用 cookie 处理。

回答by gimel

The cookielib moduleprovides cookie handling for HTTP clients.

cookielib模块提供的cookie处理的HTTP客户端。

The cookielibmodule defines classes for automatic handling of HTTP cookies. It is useful for accessing web sites that require small pieces of data – cookies – to be set on the client machine by an HTTP response from a web server, and then returned to the server in later HTTP requests.

所述cookielib模块定义类的HTTP cookies的自动处理。这对于访问需要小块数据(cookie)的网站非常有用,这些数据通过来自 Web 服务器的 HTTP 响应在客户端计算机上设置,然后在以后的 HTTP 请求中返回到服务器。

The examples in the doc show how to process cookies in conjunction with urllib:

文档中的示例展示了如何结合以下内容处理 cookie urllib

import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")

回答by Mark Lutton

Here's something that does cookies, and as a bonus does authentication for a site that requires a username and password.

这里有一些可以做 cookie 的事情,作为奖励,可以对需要用户名和密码的站点进行身份验证。

import urllib2
import cookielib
import string



def cook():
    url="http://wherever"
    cj = cookielib.LWPCookieJar()
    authinfo = urllib2.HTTPBasicAuthHandler()
    realm="realmName"
    username="userName"
    password="passWord"
    host="www.wherever.com"
    authinfo.add_password(realm, host, username, password)
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), authinfo)
    urllib2.install_opener(opener)

    # Create request object
    txheaders = { 'User-agent' : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)" }
    try:
        req = urllib2.Request(url, None, txheaders)
        cj.add_cookie_header(req)
        f = urllib2.urlopen(req)

    except IOError, e:
        print "Failed to open", url
        if hasattr(e, 'code'):
            print "Error code:", e.code

    else:

        print f
        print f.read()
        print f.info()
        f.close()
        print 'Cookies:'
        for index, cookie in enumerate(cj):
            print index, " : ", cookie      
        cj.save("cookies.lwp")

回答by Harry1992

Why don't you try Dryscrape for this:

你为什么不试试 Dryscrape:

Import dryscrape as d
d.start_xvfb()
Br = d.Session()
Br.visit('http://URL.COM')
#open webpage
Br.at_xpath('//*[@id = "email"]').set('[email protected]')
#finding input by id
Br.at_xpath('//*[@id = "pass"]').set('pasword') 
Br.at_xpath('//*[@id = "submit_button"]').click()
#put id of submit button and click it

You don't need cookie lib to store cookies just install Dryscrape and do it in your style

您不需要 cookie lib 来存储 cookie 只需安装 Dryscrape 并按照您的风格进行操作