Python Selenium 速度慢,还是我的代码错了?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17462884/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:17:08  来源:igfitidea点击:

Is Selenium slow, or is my code wrong?

pythonseleniumhttpselenium-webdriverui-automation

提问by KGo

So I'm trying to login to Quora using Python and then scrape some stuff.

所以我尝试使用 Python 登录 Quora,然后抓取一些东西。

I'm using Selenium to login to the site. Here's my code:

我正在使用 Selenium 登录该站点。这是我的代码:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get('http://www.quora.com/')

username = driver.find_element_by_name('email')
password = driver.find_element_by_name('password')

username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)

driver.close()

Now the questions:

现在的问题:

  1. It took ~4 minutes to find and fill the login form, which painfully slow. Is there something I can do to speed up the process?

  2. When it did login, how do I make sure there were no errors? In other words, how do I check the response code?

  3. How do I save cookies with selenium so I can continue scraping once I login?

  4. If there is no way to make selenium faster, is there any other alternative for logging in? (Quora doesn't have an API)

  1. 查找并填写登录表单需要大约 4 分钟,这非常缓慢。我能做些什么来加快这个过程吗?

  2. 当它确实登录时,我如何确保没有错误?换句话说,我如何检查响应代码?

  3. 如何使用 selenium 保存 cookie,以便登录后可以继续抓取?

  4. 如果没有办法让 selenium 更快,那么还有其他登录方式吗?(Quora 没有 API)

回答by manish

  1. I have been there, selenium is slow. It may not be as slow as 4 min to fill a form. I then started using phantomjs, which is much faster than firefox, since it is headless. You can simply replace Firefox() with PhantomJS() in the webdriver line after installing latest phantomjs.

  2. To check that you have login you can assert for some element which is displayed after login.

  3. As long as you do not quit your driver, cookies will be available to follow links

  4. You can try using urllib and post directly to the login link. You can use cookiejar to save cookies. You can even simply save cookie, after all, a cookie is simply a string in http header

  1. 我去过那里,硒很慢。填写表格可能不会慢到 4 分钟。然后我开始使用 phantomjs,它比 firefox 快得多,因为它是无头的。安装最新的 phantomjs 后,您可以简单地将 webdriver 行中的 Firefox() 替换为 PhantomJS()。

  2. 要检查您是否已登录,您可以对登录后显示的某些元素进行断言。

  3. 只要您不退出驱动程序,就可以使用 cookie 来跟踪链接

  4. 您可以尝试使用 urllib 并直接发布到登录链接。您可以使用 cookiejar 来保存 cookie。你甚至可以简单地保存cookie,毕竟cookie只是http头中的一个字符串

回答by Stormy

You can fasten your form filling by using your own setAttribute method, here is code for java for it

您可以使用自己的 setAttribute 方法来固定表单填写,这里是 java 的代码

public void setAttribute(By locator, String attribute, String value) {
    ((JavascriptExecutor) getDriver()).executeScript("arguments[0].setAttribute('" + attribute
            + "',arguments[1]);",
            getElement(locator),
            value);
}

回答by Polly

I had a similar problem with very slow find_elements_xxx calls in Python selenium using the ChromeDriver. I eventually tracked down the trouble to a driver.implicitly_wait() call I made prior to my find_element_xxx() calls; when I took it out, my find_element_xxx() calls ran quickly.

我在使用 ChromeDriver 在 Python selenium 中调用 find_elements_xxx 时遇到了类似的问题。我最终将问题追溯到我在 find_element_xxx() 调用之前所做的 driver.implicitly_wait() 调用;当我把它拿出来时,我的 find_element_xxx() 调用跑得很快。

Now, I knowthose elements were there when I did the find_elements_xxx() calls. So I cannot imagine why the implicit_wait should have affected the speed of those operations, but it did.

现在,当我执行 find_elements_xxx() 调用时,我知道这些元素就在那里。所以我无法想象为什么implicit_wait会影响这些操作的速度,但确实如此。

回答by user3002067

For Windows 7 and IEDRIVER with Python Selenium, Ending the Windows Command Line and restarting it cured my issue.

对于带有 Python Selenium 的 Windows 7 和 IEDRIVER,结束 Windows 命令行并重新启动它解决了我的问题。

I was having trouble with find_element..clicks. They were taking 30 seconds plus a little bit. Here's the type of code I have including capturing how long to run.

我在使用 find_element..clicks 时遇到了问题。他们花了 30 秒加上一点点。这是我拥有的代码类型,包括捕获运行时间。

timeStamp = time.time()
elem = driver.find_element_by_css_selector(clickDown).click()
print("1 took:",time.time() - timeStamp)

timeStamp = time.time()
elem = driver.find_element_by_id("cSelect32").click()
print("2 took:",time.time() - timeStamp)

That was recording about 31 seconds for each click. After ending the command line and restarting it (which does end any IEDRIVERSERVER.exe processes), it was 1 second per click.

每次点击记录大约 31 秒。结束命令行并重新启动它(这会结束任何 IEDRIVERSERVER.exe 进程)后,每次单击需要 1 秒。

回答by oldboy

Running the web driver headlessly should improve its execution speed to some degree.

无头运行 Web 驱动程序应该会在一定程度上提高其执行速度。

from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options

options = Options()
options.add_argument('-headless')
browser = webdriver.Firefox(firefox_options=options)

browser.get('https://google.com/')
browser.close()

回答by Alex Makarenko

I have changed locators and this works fast. Also, I have added working with cookies. Check the code below:

我已经改变了定位器,这很快。此外,我还添加了使用 cookie 的功能。检查下面的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
import pickle


driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
wait = WebDriverWait(driver, 5)
username = wait.until(EC.presence_of_element_located((By.XPATH, '//div[@class="login"]//input[@name="email"]')))
password = wait.until(EC.presence_of_element_located((By.XPATH, '//div[@class="login"]//input[@name="password"]')))

username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)

wait.until(EC.presence_of_element_located((By.XPATH, '//span[text()="Add Question"]'))) # checking that user logged in
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb")) # saving cookies
driver.close()

We have saved cookies and now we will apply them in a new browser:

我们已经保存了 cookie,现在我们将在新浏览器中应用它们:

driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
    driver.add_cookie(cookie)
driver.get('http://www.quora.com/')

Hope, this will help.

希望,这会有所帮助。