Python 使用 chrome headless 和 selenium 下载

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45631715/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:10:41  来源:igfitidea点击:

Downloading with chrome headless and selenium

pythongoogle-chromeseleniumgoogle-chrome-headless

提问by TheChetan

I'm using python-selenium and Chrome 59 and trying to automate a simple download sequence. When I launch the browser normally, the download works, but when I do so in headless mode, the download doesn't work.

我正在使用 python-selenium 和 Chrome 59 并尝试自动化一个简单的下载序列。当我正常启动浏览器时,下载工作,但当我在无头模式下这样做时,下载不起作用。

# Headless implementation
from selenium import webdriver

chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("headless")

driver = webdriver.Chrome(chrome_options=chromeOptions)

driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
# ^^^ Download doesn't start


# Normal Mode
from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
# ^^^ Download works normally


I've even tried adding a default path:

我什至尝试添加默认路径:

prefs = {"download.default_directory" : "/Users/Chetan/Desktop/"}
chromeOptions.add_argument("headless")
chromeOptions.add_experimental_option("prefs",prefs)

Adding a default path works in the normal implementation, but the same problem persists in the headless version.

添加默认路径在正常实现中有效,但在无头版本中仍然存在相同的问题。

How do I get the download to start in headless mode?

如何让下载以无头模式开始?

采纳答案by Shawn Button

Yes, it's a "feature", for security. As mentioned before here is the bug discussion: https://bugs.chromium.org/p/chromium/issues/detail?id=696481

是的,这是一个“功能”,为了安全。如前所述,这里是错误讨论:https: //bugs.chromium.org/p/chromium/issues/detail?id=696481

Support was added in chrome version 62.0.3196.0 or above to enable downloading.

在 chrome 版本 62.0.3196.0 或更高版本中添加了支持以启用下载。

Here is a python implementation. I had to add the command to the chromedriver commands. I will try to submit a PR so it is included in the library in the future.

这是一个python实现。我不得不将该命令添加到 chromedriver 命令中。我将尝试提交 PR,以便将来将其包含在库中。

def enable_download_in_headless_chrome(self, driver, download_dir):
    # add missing support for chrome "send_command"  to selenium webdriver
    driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')

    params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
    command_result = driver.execute("send_command", params)

For reference here is a little repo to demonstrate how to use this: https://github.com/shawnbutton/PythonHeadlessChrome

作为参考,这里有一个小仓库来演示如何使用它:https: //github.com/shawnbutton/PythonHeadlessChrome

update 2020-05-01There have been comments saying this is not working anymore. Given this patch is now over a year old it's quite possible they have changed the underlying library.

更新 2020-05-01有评论说这不再起作用了。鉴于这个补丁现在已经一年多了,他们很可能已经改变了底层库。

回答by Fay?al

Here's a working example for Python based on Shawn Button's answer. I've tested this with Chromium 68.0.3440.75& chromedriver 2.38

这是基于Shawn Button's answer 的Python 工作示例。我已经用Chromium 68.0.3440.75& chromedriver 2.38测试过了

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_experimental_option("prefs", {
  "download.default_directory": "/path/to/download/dir",
  "download.prompt_for_download": False,
})

chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)

driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': "/path/to/download/dir"}}
command_result = driver.execute("send_command", params)

driver.get('http://download-page.url/')
driver.find_element_by_css_selector("#download_link").click()

回答by Some1Else

This is a feature of Chrome to prevent from software to download files to your computer. There is a workaround though. Read more about it here.

这是 Chrome 的一项功能,可防止软件将文件下载到您的计算机。不过有一个解决方法。在此处阅读更多相关信息

What you need to do is enable it via DevTools, Something like that:

你需要做的是通过 DevTools 启用它,就像这样:

async function setDownload () {
  const client = await CDP({tab: 'ws://localhost:9222/devtools/browser'});
  const info =  await client.send('Browser.setDownloadBehavior', {behavior : "allow", downloadPath: "/tmp/"});
  await client.close();
}

This is the solution some one gave in the mentioned topic. Here is his comment.

这是有人在提到的主题中给出的解决方案。这是他的评论

回答by Hazem

Maybe the website that you handle returns different HTML pages for browsers, means the XPath or Id that you want maybe differently in headless browser. Try to download pageSource in headless browser and open it as HTML page to see the Id or XPath that you want. You can see this as c# example How to hide FirefoxDriver (using Selenium) without findElement function error in PhantomDriver?.

也许您处理的网站为浏览器返回不同的 HTML 页面,这意味着您想要的 XPath 或 Id 在无头浏览器中可能有所不同。尝试在无头浏览器中下载 pageSource 并将其作为 HTML 页面打开以查看所需的 Id 或 XPath。您可以将其视为 c# 示例How to hide FirefoxDriver (using Selenium) without findElement function error in PhantomDriver? .

回答by victorvartan

Usually it's redundant seeing the same thing just written in another language, but because this issue drove me crazy, I hope I'm saving someone else from the pain... so here's the C# version of Shawn Button's answer(tested with headless chrome=71.0.3578.98, chromedriver=2.45.615279, platform=Linux 4.9.125-linuxkit x86_64)):

通常看到用另一种语言编写的相同内容是多余的,但是因为这个问题让我发疯,我希望我能将其他人从痛苦中拯救出来......所以这是Shawn Button 答案的 C# 版本(使用无头 chrome= 71.0.3578.98,chromedriver=2.45.615279,平台=Linux 4.9.125-linuxkit x86_64)):

            var enableDownloadCommandParameters = new Dictionary<string, object>
            {
                { "behavior", "allow" },
                { "downloadPath", downloadDirectoryPath }
            };
            var result = ((OpenQA.Selenium.Chrome.ChromeDriver)driver).ExecuteChromeCommandWithResult("Page.setDownloadBehavior", enableDownloadCommandParameters);

回答by Matheus Araujo

I solved this problem by using the workaround shared by @Shawn Button and using the full pathfor the 'downloadPath' parameter. Using a relative pathdid not work and give me the error.

我通过使用@Shawn Button 共享的解决方法并使用“downloadPath”参数的完整路径解决了这个问题。使用相对路径不起作用并给我错误。

Versions:
Chrome Version 75.0.3770.100 (Official Build) (32-bit)
ChromeDriver 75.0.3770.90

版本:
Chrome 版本 75.0.3770.100(官方版本)(32 位)
ChromeDriver 75.0.3770.90

回答by Manasi Vora

Following is the equivalent in Java, selenium, chromedriver and chrome v 71.x. The code in is the key to allow saving of downloads Additional jars: com.fasterxml.Hymanson.core, com.fasterxml.Hymanson.annotation, com.fasterxml.Hymanson.databind

以下是 Java、selenium、chromedriver 和 chrome v 71.x 中的等效项。中的代码是允许保存下载的关键 附加 jars: com.fasterxml.Hymanson.core, com.fasterxml.Hymanson.annotation, com.fasterxml.Hymanson.databind

System.setProperty("webdriver.chrome.driver","C:\libraries\chromedriver.exe");

System.setProperty("webdriver.chrome.driver","C:\libraries\chromedriver.exe");

            String downloadFilepath = "C:\Download";
            HashMap<String, Object> chromePreferences = new HashMap<String, Object>();
            chromePreferences.put("profile.default_content_settings.popups", 0);
            chromePreferences.put("download.prompt_for_download", "false");
            chromePreferences.put("download.default_directory", downloadFilepath);
            ChromeOptions chromeOptions = new ChromeOptions();
            chromeOptions.setBinary("C:\pathto\Chrome SxS\Application\chrome.exe");

            //ChromeOptions options = new ChromeOptions();
            //chromeOptions.setExperimentalOption("prefs", chromePreferences);
            chromeOptions.addArguments("start-maximized");
            chromeOptions.addArguments("disable-infobars");


            //HEADLESS CHROME
            **chromeOptions.addArguments("headless");**

            chromeOptions.setExperimentalOption("prefs", chromePreferences);
            DesiredCapabilities cap = DesiredCapabilities.chrome();
            cap.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true);
            cap.setCapability(ChromeOptions.CAPABILITY, chromeOptions);

            **ChromeDriverService driverService = ChromeDriverService.createDefaultService();
            ChromeDriver driver = new ChromeDriver(driverService, chromeOptions);

            Map<String, Object> commandParams = new HashMap<>();
            commandParams.put("cmd", "Page.setDownloadBehavior");
            Map<String, String> params = new HashMap<>();
            params.put("behavior", "allow");
            params.put("downloadPath", downloadFilepath);
            commandParams.put("params", params);
            ObjectMapper objectMapper = new ObjectMapper();
            HttpClient httpClient = HttpClientBuilder.create().build();
            String command = objectMapper.writeValueAsString(commandParams);
            String u = driverService.getUrl().toString() + "/session/" + driver.getSessionId() + "/chromium/send_command";
            HttpPost request = new HttpPost(u);
            request.addHeader("content-type", "application/json");
            request.setEntity(new StringEntity(command));**
            try {
                httpClient.execute(request);
            } catch (IOException e2) {
                // TODO Auto-generated catch block
                e2.printStackTrace();
            }**

        //Continue using the driver for automation  
    driver.manage().window().maximize();

回答by Mykhailo Kovalskyi

A full working example for JavaScript with selenium-cucumber-js / selenium-webdriver:

带有 selenium-cucumber-js / selenium-webdriver 的 JavaScript 完整工作示例:

const chromedriver = require('chromedriver');
const selenium = require('selenium-webdriver');
const command = require('selenium-webdriver/lib/command');
const chrome = require('selenium-webdriver/chrome');

module.exports = function() {

  const chromeOptions = new chrome.Options()
    .addArguments('--no-sandbox', '--headless', '--start-maximized', '--ignore-certificate-errors')
    .setUserPreferences({
      'profile.default_content_settings.popups': 0, // disable download file dialog
      'download.default_directory': '/tmp/downloads', // default file download location
      "download.prompt_for_download": false,
      'download.directory_upgrade': true,
      'safebrowsing.enabled': false,
      'plugins.always_open_pdf_externally': true,
      'plugins.plugins_disabled': ["Chrome PDF Viewer"]
    })
    .windowSize({width: 1600, height: 1200});

  const driver = new selenium.Builder()
    .withCapabilities({
      browserName: 'chrome',
      javascriptEnabled: true,
      acceptSslCerts: true,
      path: chromedriver.path
    })
    .setChromeOptions(chromeOptions)
    .build();

  driver.manage().window().maximize();

  driver.getSession()
    .then(session => {
      const cmd = new command.Command("SEND_COMMAND")
        .setParameter("cmd", "Page.setDownloadBehavior")
        .setParameter("params", {'behavior': 'allow', 'downloadPath': '/tmp/downloads'});
      driver.getExecutor().defineCommand("SEND_COMMAND", "POST", `/session/${session.getId()}/chromium/send_command`);
      return driver.execute(cmd);
    });

  return driver;
};

The key part is:

关键部分是:

  driver.getSession()
    .then(session => {
      const cmd = new command.Command("SEND_COMMAND")
        .setParameter("cmd", "Page.setDownloadBehavior")
        .setParameter("params", {'behavior': 'allow', 'downloadPath': '/tmp/downloads'});
      driver.getExecutor().defineCommand("SEND_COMMAND", "POST", `/session/${session.getId()}/chromium/send_command`);
      return driver.execute(cmd);
    });

Tested with:

测试:

  • Chrome 67.0.3396.99
  • Chromedriver 2.36.540469
  • selenium-cucumber-js 1.5.12
  • selenium-webdriver 3.0.0
  • 铬 67.0.3396.99
  • Chromedriver 2.36.540469
  • 硒黄瓜js 1.5.12
  • 硒网络驱动程序 3.0.0