使用 urllib2 - Python 2.7 登录网站

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13925983/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 09:57:05  来源:igfitidea点击:

Login to website using urllib2 - Python 2.7

pythonpython-2.7loginurllib2

提问by tommo

Okay, so I am using this for a reddit bot, but I want to be able to figure out HOW to log in to any website. If that makes sense....

好的,所以我将它用于 reddit 机器人,但我希望能够弄清楚如何登录任何网站。如果这是有道理的......

I realise that different websites use different login forms etc. So how do I figure out how to optimise it for each website? I'm assuming I need to look for something in the html file but no idea what.

我意识到不同的网站使用不同的登录表单等。那么我如何弄清楚如何为每个网站优化它?我假设我需要在 html 文件中寻找一些东西,但不知道是什么。

I do NOT want to use Mechanize or any other library (which is what all the other answers are about on here and don't actually help me to learn what is happening), as I want to learn by myself how exactly it all works.

我不想使用 Mechanize 或任何其他库(这是这里所有其他答案的内容,实际上并没有帮助我了解正在发生的事情),因为我想自己了解它究竟是如何工作的。

The urllib2 documentation really isn't helping me.

urllib2 文档真的没有帮助我。

Thanks.

谢谢。

采纳答案by RocketDonkey

I'll preface this by saying I haven't done logging in in this way for a while, so I could be missing some of the more 'accepted' ways to do it.

我先说我有一段时间没有以这种方式登录,所以我可能会错过一些更“被接受”的方式来做到这一点。

I'm not sure if this is what you're after, but without a library like mechanizeor a more robust framework like selenium, in the basic case you just look at the form itself and seek out the inputs. For instance, looking at www.reddit.com, and then viewing the source of the rendered page, you will find this form:

我不知道这是否是你追求的,但没有像一个图书馆mechanize或更像一个强有力的框架selenium,在基本情况下,你只要看看表单本身和寻找的inputs。例如,查看www.reddit.com,然后查看呈现页面的来源,您会发现以下形式:

<form method="post" action="https://ssl.reddit.com/post/login" id="login_login-main"
  class="login-form login-form-side">
    <input type="hidden" name="op" value="login-main" />
    <input name="user" placeholder="username" type="text" maxlength="20" tabindex="1" />
    <input name="passwd" placeholder="password" type="password" tabindex="1" />

    <div class="status"></div>

    <div id="remember-me">
      <input type="checkbox" name="rem" id="rem-login-main" tabindex="1" />
      <label for="rem-login-main">remember me</label>
      <a class="recover-password" href="/password">reset password</a>
    </div>

    <div class="submit">
      <button class="btn" type="submit" tabindex="1">login</button>
    </div>

    <div class="clear"></div>
</form>

Here we see a few input's - op, user, passwdand rem. Also, notice the actionparameter - that is the URL to which the form will be posted, and will therefore be our target. So now the last step is packing the parameters into a payload and sending it as a POSTrequest to the actionURL. Also below, we create a new opener, add the ability to handle cookies and add headers as well, giving us a slightly more robust opener to execute the requests):

在这里,我们看到了几个input的- ,,和。另外,请注意参数 - 即表单将发布到的 URL,因此将成为我们的目标。所以现在最后一步是将参数打包到负载中并将其作为请求发送到URL。同样在下面,我们创建了一个新的,添加了处理 cookie 和添加标头的能力,为我们提供了一个更强大的开启器来执行请求):opuserpasswdremactionPOSTactionopener

import cookielib
import urllib
import urllib2


# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

# Add our headers
opener.addheaders = [('User-agent', 'RedditTesting')]

# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)

# The action/ target from the form
authentication_url = 'https://ssl.reddit.com/post/login'

# Input parameters we are going to send
payload = {
  'op': 'login-main',
  'user': '<username>',
  'passwd': '<password>'
  }

# Use urllib to encode the payload
data = urllib.urlencode(payload)

# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)

# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()

Note that this can get much more complicated - you can also do this with GMail, for instance, but you need to pull in parameters that will change every time (such as the GALXparameter). Again, not sure if this is what you wanted, but hope it helps.

请注意,这可能会变得更加复杂 - 例如,您也可以使用 GMail 执行此操作,但是您需要引入每次都会更改的GALX参数(例如参数)。同样,不确定这是否是您想要的,但希望它有所帮助。