在 Python 和 PhantomJS 中使用 Selenium 将文件下载到文件系统
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/25755713/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using Selenium with Python and PhantomJS to download file to filesystem
提问by Encinoman818
I've been grappling with using PhantomJS/Selenium/python-selenium to download a file to the filesystem. I'm able to easily navigate through the DOM and click, hover etc. Downloading a file is, however, proving to be quite troublesome. I've tried a headless approach with Firefox and pyvirtualdisplay but that wasn't working well either and was unbelievably slow. I know That CasperJS allows for file downloads. Does anyone know how to integrate CasperJS with Python or how to utilize PhantomJS to download files. Much appreciated.
我一直在努力使用 PhantomJS/Selenium/python-selenium 将文件下载到文件系统。我能够轻松地浏览 DOM 并单击、悬停等。但是,事实证明下载文件非常麻烦。我已经尝试过使用 Firefox 和 pyvirtualdisplay 的无头方法,但效果不佳,而且速度慢得令人难以置信。我知道 CasperJS 允许文件下载。有谁知道如何将 CasperJS 与 Python 集成或如何利用 PhantomJS 下载文件。非常感激。
回答by alecxe
PhantomJS doesn't currently support file downloads. Relevant issues with workarounds:
PhantomJS 目前不支持文件下载。解决方法的相关问题:
As far as I understand, you have at least 3 options:
据我了解,您至少有 3 个选择:
- switch to casperjs(and you should leave python here)
- try with headless on xvfb
- switch to normal non-headless browsers
- 切换到casperjs(你应该把 python 留在这里)
- 尝试无头 xvfb
- 切换到普通的非无头浏览器
Here are also some links that might help too:
这里还有一些链接也可能有帮助:
回答by valignatev
Despite this question is quite old, downloading files through PhantomJSis still a problem. But we can use PhantomJS to get download link and fetch all needed cookies such as csrf tokens and so on. And then we can use requeststo download it actually:
尽管这个问题已经很老了,但通过下载文件PhantomJS仍然是一个问题。但是我们可以使用 PhantomJS 获取下载链接并获取所有需要的 cookie,例如 csrf 令牌等。然后我们可以使用requests它来实际下载它:
import requests
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('page_with_download_link')
download_link = driver.find_element_by_id('download_link')
session = requests.Session()
cookies = driver.get_cookies()
for cookie in cookies: 
    session.cookies.set(cookie['name'], cookie['value'])
response = session.get(download_link)
And now in response.contentactual file content should appear. We can next write it with openor do whatever we want.
现在在response.content实际的文件内容中应该出现。接下来,我们可以open随心所欲地编写它或做任何我们想做的事情。
回答by dnbwise
My use case required a form submission to retrieve the file. I was able to accomplish this using the driver's execute_async_script()function.
我的用例需要提交表单来检索文件。我能够使用驱动程序的execute_async_script()功能完成此操作。
 js = '''
    var callback = arguments[0];
    var theForm = document.forms['theFormId'];
    data = new FormData();
    data.append('eventTarget', "''' + target + '''"); // this is the id of the file clicked
    data.append('otherFormField', theForm.otherFormField.value);
    var xhr = new XMLHttpRequest();
    xhr.open('POST', theForm.action, true);
'''
for cookie in driver.get_cookies():
    js += ' xhr.setRequestHeader("' + cookie['name'] + '", "' + cookie['value'] + '"); '
js += '''
    xhr.onload = function () {
        callback(this.responseText);
    };
    xhr.send(data);
'''
driver.set_script_timeout(30)
file = driver.execute_async_script(js)
回答by hkeyland
Is not posible in that way. You can use other alternatives to download files like wget o curl.
那样是不可能的。您可以使用其他替代方法来下载文件,例如 wget o curl。
Use firefox to find the right request and selenium to get the values for that and finally use out of to the box to download the file
使用 firefox 找到正确的请求和 selenium 来获取它的值,最后使用开箱即用来下载文件
curlCall=" curl 'http://www_sitex_org/descarga.jsf' -H '...allCurlRequest....' > file.xml"
subprocess.call(curlCall, shell=True)

