Python 如何使用 selenium 在点击事件中下载文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18439851/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:45:43  来源:igfitidea点击:

How can I download a file on a click event using selenium?

pythonseleniumselenium-webdriver

提问by sam

I am working on python and selenium. I want to download file from clicking event using selenium. I wrote following code.

我正在研究 python 和 selenium。我想使用 selenium 从点击事件下载文件。我写了以下代码。

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys

browser = webdriver.Firefox()
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")

browser.close()

I want to download both files from links with name "Export Data" from given url. How can I achieve it as it works with click event only?

我想从给定 url 的名称为“导出数据”的链接下载这两个文件。我怎样才能实现它,因为它只适用于点击事件?

采纳答案by falsetru

Find the link using find_element(s)_by_*, then call clickmethod.

使用 找到链接find_element(s)_by_*,然后调用click方法。

from selenium import webdriver

# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')

browser = webdriver.Firefox(profile)
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")

browser.find_element_by_id('exportpt').click()
browser.find_element_by_id('exporthlgt').click()

Added profile manipulation code to prevent download dialog.

添加了配置文件操作代码以防止下载对话框。

回答by Joshua Burns

I'll admit this solution is a little more "hacky" than the Firefox Profile saveToDisk alternative, but it works across both Chrome and Firefox, and doesn't rely on a browser-specific feature which could change at any time. And if nothing else, maybe this will give someone a little different perspective on how to solve future challenges.

我承认这个解决方案比 Firefox Profile saveToDisk 替代方案更“hacky”,但它适用于 Chrome 和 Firefox,并且不依赖于随时可能更改的特定于浏览器的功能。如果不出意外,也许这会让某人对如何解决未来的挑战有一些不同的看法。

Prerequisites: Ensure you have selenium and pyvirtualdisplay installed...

先决条件:确保您已安装 selenium 和 pyvirtualdisplay...

  • Python 2: sudo pip install selenium pyvirtualdisplay
  • Python 3: sudo pip3 install selenium pyvirtualdisplay
  • 蟒蛇2: sudo pip install selenium pyvirtualdisplay
  • 蟒蛇3: sudo pip3 install selenium pyvirtualdisplay

The Magic

魔法

import pyvirtualdisplay
import selenium
import selenium.webdriver
import time
import base64
import json

root_url = 'https://www.google.com'
download_url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'

print('Opening virtual display')
display = pyvirtualdisplay.Display(visible=0, size=(1280, 1024,))
display.start()
print('\tDone')

print('Opening web browser')
driver = selenium.webdriver.Firefox()
#driver = selenium.webdriver.Chrome() # Alternately, give Chrome a try
print('\tDone')

print('Retrieving initial web page')
driver.get(root_url)
print('\tDone')

print('Injecting retrieval code into web page')
driver.execute_script("""
    window.file_contents = null;
    var xhr = new XMLHttpRequest();
    xhr.responseType = 'blob';
    xhr.onload = function() {
        var reader  = new FileReader();
        reader.onloadend = function() {
            window.file_contents = reader.result;
        };
        reader.readAsDataURL(xhr.response);
    };
    xhr.open('GET', %(download_url)s);
    xhr.send();
""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {
    'download_url': json.dumps(download_url),
})

print('Looping until file is retrieved')
downloaded_file = None
while downloaded_file is None:
    # Returns the file retrieved base64 encoded (perfect for downloading binary)
    downloaded_file = driver.execute_script('return (window.file_contents !== null ? window.file_contents.split(\',\')[1] : null);')
    print(downloaded_file)
    if not downloaded_file:
        print('\tNot downloaded, waiting...')
        time.sleep(0.5)
print('\tDone')

print('Writing file to disk')
fp = open('google-logo.png', 'wb')
fp.write(base64.b64decode(downloaded_file))
fp.close()
print('\tDone')
driver.close() # close web browser, or it'll persist after python exits.
display.popen.kill() # close virtual display, or it'll persist after python exits.

Explaination

说明

We first load a URL on the domain we're targeting a file download from. This allows us to perform an AJAX request on that domain, without running into cross site scriptingissues.

我们首先在我们要从中下载文件的域上加载一个 URL。这允许我们在该域上执行 AJAX 请求,而不会遇到跨站点脚本问题。

Next, we're injecting some javascript into the DOM which fires off an AJAX request. Once the AJAX request returns a response, we take the response and load it into a FileReader object. From there we can extract the base64 encoded content of the file by calling readAsDataUrl(). We're then taking the base64 encoded content and appending it to window, a gobally accessible variable.

接下来,我们将一些 javascript 注入到触发 AJAX 请求的 DOM 中。一旦 AJAX 请求返回响应,我们就获取响应并将其加载到 FileReader 对象中。从那里我们可以通过调用 readAsDataUrl() 提取文件的 base64 编码内容。然后我们获取 base64 编码的内容并将其附加到window,一个全局可访问的变量。

Finally, because the AJAX request is asynchronous, we enter a Python while loop waiting for the content to be appended to the window. Once it's appended, we decode the base64 content retrieved from the window and save it to a file.

最后,由于 AJAX 请求是异步的,我们进入了一个 Python while 循环,等待将内容追加到窗口中。附加后,我们解码从窗口中检索到的 base64 内容并将其保存到文件中。

This solution should work across all modern browsers supported by Selenium, and works whether text or binary, and across all mime types.

此解决方案应适用于 Selenium 支持的所有现代浏览器,并且适用于文本或二进制以及所有 MIME 类型。

Alternate Approach

替代方法

While I haven't tested this, Selenium does afford you the ability to wait until an element is present in the DOM. Rather than looping until a globally accessible variable is populated, you could create an element with a particular ID in the DOM and use the binding of that element as the trigger to retrieve the downloaded file.

虽然我还没有对此进行测试,但 Selenium 确实为您提供了等待元素出现在 DOM 中的能力。您可以在 DOM 中创建一个具有特定 ID 的元素,并使用该元素的绑定作为触发器来检索下载的文件,而不是循环直到填充一个全局可访问的变量。

回答by TiagoLr

In chrome what I do is downloading the files by clicking on the links, then I open chrome://downloadspage and then retrieve the downloaded files list from shadow DOM like this:

在 chrome 中,我所做的是通过单击链接下载文件,然后打开chrome://downloads页面,然后从 shadow DOM 中检索下载的文件列表,如下所示:

docs = document
  .querySelector('downloads-manager')
  .shadowRoot.querySelector('#downloads-list')
  .getElementsByTagName('downloads-item')

This solution is restrained to chrome, the data also contains information like file path and download date. (note this code is from JS, may not be the correct python syntax)

此解决方案仅限于 chrome,数据还包含文件路径和下载日期等信息。(注意这段代码来自JS,可能不是正确的python语法)

回答by Ashutosh Kumar

Here is the full working code. You can use web scrapping to enter the username password and other field. For getting the field names appearing on the webpage, use inspect element. Element name(Username,Password or Click Button) can be entered through class or name.

这是完整的工作代码。您可以使用网页抓取来输入用户名密码和其他字段。要获取网页上显示的字段名称,请使用检查元素。元素名称(用户名、密码或单击按钮)可以通过类或名称输入。

from selenium import webdriver
# Using Chrome to access web
options = webdriver.ChromeOptions() 
options.add_argument("download.default_directory=C:/Test") # Set the download Path
driver = webdriver.Chrome(options=options)
# Open the website
try:
    driver.get('xxxx') # Your Website Address
    password_box = driver.find_element_by_name('password')
    password_box.send_keys('xxxx') #Password
    download_button = driver.find_element_by_class_name('link_w_pass')
    download_button.click()
    driver.quit()
except:
    driver.quit()
    print("Faulty URL")