在 Python 和 PhantomJS 中使用 Selenium 将文件下载到文件系统

Question

提问by Encinoman818

I've been grappling with using PhantomJS/Selenium/python-selenium to download a file to the filesystem. I'm able to easily navigate through the DOM and click, hover etc. Downloading a file is, however, proving to be quite troublesome. I've tried a headless approach with Firefox and pyvirtualdisplay but that wasn't working well either and was unbelievably slow. I know That CasperJS allows for file downloads. Does anyone know how to integrate CasperJS with Python or how to utilize PhantomJS to download files. Much appreciated.

我一直在努力使用 PhantomJS/Selenium/python-selenium 将文件下载到文件系统。我能够轻松地浏览 DOM 并单击、悬停等。但是，事实证明下载文件非常麻烦。我已经尝试过使用 Firefox 和 pyvirtualdisplay 的无头方法，但效果不佳，而且速度慢得令人难以置信。我知道 CasperJS 允许文件下载。有谁知道如何将 CasperJS 与 Python 集成或如何利用 PhantomJS 下载文件。非常感激。

Answer 1

回答by alecxe

PhantomJS doesn't currently support file downloads. Relevant issues with workarounds:

PhantomJS 目前不支持文件下载。解决方法的相关问题：

As far as I understand, you have at least 3 options:

据我了解，您至少有 3 个选择：

switch to casperjs(and you should leave python here)
try with headless on xvfb
switch to normal non-headless browsers

切换到casperjs（你应该把 python 留在这里）
尝试无头 xvfb
切换到普通的非无头浏览器

Here are also some links that might help too:

这里还有一些链接也可能有帮助：

Answer 2

回答by valignatev

Despite this question is quite old, downloading files through PhantomJSis still a problem. But we can use PhantomJS to get download link and fetch all needed cookies such as csrf tokens and so on. And then we can use requeststo download it actually:

尽管这个问题已经很老了，但通过下载文件PhantomJS仍然是一个问题。但是我们可以使用 PhantomJS 获取下载链接并获取所有需要的 cookie，例如 csrf 令牌等。然后我们可以使用requests它来实际下载它：

import requests
from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get('page_with_download_link')
download_link = driver.find_element_by_id('download_link')
session = requests.Session()
cookies = driver.get_cookies()

for cookie in cookies: 
    session.cookies.set(cookie['name'], cookie['value'])
response = session.get(download_link)

And now in response.contentactual file content should appear. We can next write it with openor do whatever we want.

现在在response.content实际的文件内容中应该出现。接下来，我们可以open随心所欲地编写它或做任何我们想做的事情。

Answer 3

回答by dnbwise

My use case required a form submission to retrieve the file. I was able to accomplish this using the driver's execute_async_script()function.

我的用例需要提交表单来检索文件。我能够使用驱动程序的execute_async_script()功能完成此操作。

 js = '''
    var callback = arguments[0];
    var theForm = document.forms['theFormId'];
    data = new FormData();
    data.append('eventTarget', "''' + target + '''"); // this is the id of the file clicked
    data.append('otherFormField', theForm.otherFormField.value);

    var xhr = new XMLHttpRequest();
    xhr.open('POST', theForm.action, true);
'''

for cookie in driver.get_cookies():
    js += ' xhr.setRequestHeader("' + cookie['name'] + '", "' + cookie['value'] + '"); '

js += '''
    xhr.onload = function () {
        callback(this.responseText);
    };
    xhr.send(data);
'''

driver.set_script_timeout(30)
file = driver.execute_async_script(js)

Answer 4

回答by hkeyland

Is not posible in that way. You can use other alternatives to download files like wget o curl.

那样是不可能的。您可以使用其他替代方法来下载文件，例如 wget o curl。

Use firefox to find the right request and selenium to get the values for that and finally use out of to the box to download the file

使用 firefox 找到正确的请求和 selenium 来获取它的值，最后使用开箱即用来下载文件

curlCall=" curl 'http://www_sitex_org/descarga.jsf' -H '...allCurlRequest....' > file.xml"
subprocess.call(curlCall, shell=True)

在 Python 和 PhantomJS 中使用 Selenium 将文件下载到文件系统

提问by Encinoman818

回答by alecxe

回答by valignatev

回答by dnbwise

回答by hkeyland

相关推荐

最近更新

标签

在 Python 和 PhantomJS 中使用 Selenium 将文件下载到文件系统

提问by Encinoman818

回答by alecxe

回答by valignatev

回答by dnbwise

回答by hkeyland

相关推荐

在 Python 中绘制快速傅立叶变换

Python 找到点是否位于点云的凸包中的有效方法是什么？

Python 类型错误：float() 参数必须是 Django 距离中的字符串或数字

Python 约束非线性优化

相关推荐

最近更新

标签