使用python请求登录网站
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43285622/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Login to website using python requests
提问by Pablo
I'm trying to login to https://www.voxbeam.com/loginusing requests to scrape data. I'm a python beginner and I have done mostly tutorials, and some web scraping on my own with BeautifulSoup.
我正在尝试使用抓取数据的请求登录到https://www.voxbeam.com/login。我是一名 Python 初学者,我主要完成了教程,并使用 BeautifulSoup 自己进行了一些网络抓取。
Looking at the HTML:
查看 HTML:
<form id="loginForm" action="https://www.voxbeam.com//login" method="post" autocomplete="off">
<input name="userName" id="userName" class="text auto_focus" placeholder="Username" autocomplete="off" type="text">
<input name="password" id="password" class="password" placeholder="Password" autocomplete="off" type="password">
<input id="challenge" name="challenge" value="78ed64f09c5bcf53ead08d967482bfac" type="hidden">
<input id="hash" name="hash" type="hidden">
I understand I should be using the method post, and sending userNameand password
我知道我应该使用post方法,并发送用户名和密码
I'm trying this:
我正在尝试这个:
import requests
import webbrowser
url = "https://www.voxbeam.com/login"
login = {'userName': 'xxxxxxxxx',
'password': 'yyyyyyyyy'}
print("Original URL:", url)
r = requests.post(url, data=login)
print("\nNew URL", r.url)
print("Status Code:", r.status_code)
print("History:", r.history)
print("\nRedirection:")
for i in r.history:
print(i.status_code, i.url)
# Open r in the browser to check if I logged in
new = 2 # open in a new tab, if possible
webbrowser.open(r.url, new=new)
I'm expecting, after a successful login to get in rthe URL to the dashboard, so I can begin scraping the data I need.
我期待,在成功登录进去[R的URL到仪表板,这样我就可以开始刮的数据,我需要。
When I run the code with the authentication information in place of xxxxxx and yyyyyy, I get the following output:
当我使用身份验证信息代替 xxxxxx 和 yyyyyy 运行代码时,我得到以下输出:
Original URL: https://www.voxbeam.com/login
New URL https://www.voxbeam.com/login
Status Code: 200
History: []
Redirection:
Process finished with exit code 0
I get in the browser a new tab with www.voxbeam.com/login
我在浏览器中打开了一个带有 www.voxbeam.com/login 的新标签
Is there something wrong in the code? Am I missing something in the HTML? It's ok to expect to get the dashboard URL in r, or to be redirected and trying to open the URL in a browser tab to check visually the response, or I should be doing things in a different way?
代码有问题吗?我在 HTML 中遗漏了什么吗?可以期望在 r 中获取仪表板 URL,或者被重定向并尝试在浏览器选项卡中打开 URL 以直观地检查响应,或者我应该以不同的方式做事?
I been reading many similar questions here for a couple of days, but it seems every website authentication process is a little bit different, and I checked http://docs.python-requests.org/en/latest/user/authentication/which describes other methods, but I haven't found anything in the HTML that would suggest I should be using one of those instead of post
我在这里读很多类似的问题了几天,但似乎每一个网站的认证过程是一个有点不同,我查http://docs.python-requests.org/en/latest/user/authentication/这描述了其他方法,但我在 HTML 中没有发现任何建议我应该使用其中一种而不是post 的内容
I tried too
我也试过
r = requests.get(url, auth=('xxxxxxxx', 'yyyyyyyy'))
but it doesn't seem to work either.
但它似乎也不起作用。
回答by bl79
As said above, you should send values of all fields of form. Those can be find in the Web inspector of browser. This form send 2 addition hidden values:
如上所述,您应该发送表单所有字段的值。这些可以在浏览器的 Web 检查器中找到。此表单发送 2 个附加隐藏值:
url = "https://www.voxbeam.com//login"
data = {'userName':'xxxxxxxxx','password':'yyyyyyyyy','challenge':'zzzzzzzzz','hash':''}
# note that in email have encoded '@' like uuuuuuu%40gmail.com
session = requests.Session()
r = session.post(url, headers=headers, data=data)
Also, many sites have protection from a bot like hidden form fields, js, send encoded values, etc. As variants you could:
此外,许多站点都对机器人提供保护,例如隐藏表单字段、js、发送编码值等。作为变体,您可以:
1) Use a cookies from manual login:
1)使用手动登录的cookies:
url = "https://www.voxbeam.com"
headers = {'user-agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36"}
cookies = {'PHPSESSID':'zzzzzzzzzzzzzzz', 'loggedIn':'yes'}
s = requests.Session()
r = s.post(url, headers=headers, cookies=cookies)
2) Use module Selenium:
2)使用Selenium模块:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
url = "https://www.voxbeam.com//login"
driver = webdriver.Firefox()
driver.get(url)
u = driver.find_element_by_name('userName')
u.send_keys('xxxxxxxxx')
p = driver.find_element_by_name('password')
p.send_keys('yyyyyyyyy')
p.send_keys(Keys.RETURN)
回答by Mohammad Jbber
Try to specify the URL more clearly as follows :
尝试更清楚地指定 URL,如下所示:
url=https://www.voxbeam.com//login?id=loginForm
This will setFocus on the login form so that POST method applys
这将在登录表单上设置焦点,以便应用 POST 方法
回答by Reza Hosseini
It's very tricky depending on how the website handles the login process but what I did was that I used Charles which is a proxy application and listened to requests that my browser sent to the website's server while I was logging in manually. Afterwards I copied the exact same header and cookie that was shown in Charles into my own python code and it worked! I assume the cookie and header are used to prevent bot logging in.
这非常棘手,具体取决于网站如何处理登录过程,但我所做的是我使用了 Charles,它是一个代理应用程序,并在我手动登录时侦听浏览器发送到网站服务器的请求。之后,我将 Charles 中显示的完全相同的标头和 cookie 复制到我自己的 python 代码中,并且它起作用了!我假设 cookie 和 header 用于防止机器人登录。
回答by Parajuli Ram Prasad
from webbot import Browser
web = Browser() # this will navigate python to browser
link = web.go_to('enter your login page url')
#remember click the login button then place here
login = web.click('login') #if you have login button in your web , if you have signin button then replace login with signin, in my case it is login
id = web.type('enter your Id/Username/Emailid',into='Id/Username/Emilid',id='txtLoginId') #id='txtLoginId' this varies from web to web find this by inspecting the Id/Username/Emailid Button, in my case it is txtLoginId
next = web.click('NEXT', tag='span')
passw = web.type('Enter Your Password', into='Password', id='txtpasswrd')
#id='txtpasswrd' (this also varies from web to web similiarly inspect the Password Button)in my case it is txtpasswrd
home = web.click('NEXT', id="fa fa-home", tag='span')
# id="fa fa-home" (Now inspect all necessary Buttons and move accordingly) in my case it is fa fa-home
next11 = web.click('NEXT', tag='span')