在 Python 中使用 Selenium 在 Firefox 上保存网页
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37835867/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using Selenium in Python to save a webpage on Firefox
提问by Tommy N
I am trying to use Selenium
in Python
to save webpages on MacOS Firefox
.
我正在尝试使用Selenium
inPython
将网页保存在MacOS Firefox
.
So far, I have managed to click COMMAND + S
to pop up the SAVE AS window
. However,
到目前为止,我已经设法点击COMMAND + S
弹出SAVE AS window
. 然而,
I don't know how to:
我不知道如何:
- change the directory of the file,
- change the name of the file, and
- click the SAVE AS button.
- 更改文件目录,
- 更改文件名,以及
- 单击另存为按钮。
Could someone help?
有人可以帮忙吗?
Below is the code I have use to click COMMAND + S
:
下面是我用来点击的代码COMMAND + S
:
ActionChains(browser).key_down(Keys.COMMAND).send_keys("s").key_up(Keys.COMMAND).perform()
Besides, the reason for me to use this method is that I encounter Unicode Encode Errorwhen I :-
此外,我使用这种方法的原因是当我遇到Unicode 编码错误时:-
- write the page_source to a html file and
- store scrapped information to a csv file.
- 将 page_source 写入 html 文件并
- 将报废的信息存储到 csv 文件。
Write to a html file:
写入一个 html 文件:
file_object = open(completeName, "w")
html = browser.page_source
file_object.write(html)
file_object.close()
Write to a csv file:
写入 csv 文件:
csv_file_write.writerow(to_write)
Error:
错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 1: ordinal not in range(128)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 1: ordinal not in range(128)
回答by misantroop
with open('page.html', 'w') as f:
f.write(driver.page_source)
回答by RemcoW
What you are trying to achieve is impossible to do with Selenium. The dialog that opens is not something Selenium can interact with.
使用 Selenium 无法实现您想要实现的目标。打开的对话框不是 Selenium 可以与之交互的。
The closes thing you could do is collect the page_source
which gives you the entire HTML of a single page and save this to a file.
您可以做的关闭的事情是收集page_source
为您提供单个页面的整个 HTML 并将其保存到文件中。
import codecs
completeName = os.path.join(save_path, file_name)
file_object = codecs.open(completeName, "w", "utf-8")
html = browser.page_source
file_object.write(html)
If you really need to save the entire website you should look into using a tool like AutoIT. This will make it possible to interact with the save dialog.
如果你真的需要保存整个网站,你应该考虑使用像 AutoIT 这样的工具。这将使与保存对话框交互成为可能。
回答by Mobrockers
You cannot interact with system dialogs like save file dialog. If you want to save the page html you can do something like this:
您无法与保存文件对话框等系统对话框进行交互。如果要保存页面 html,可以执行以下操作:
page = driver.page_source
file_ = open('page.html', 'w')
file_.write(page)
file_.close()
回答by Martin Thoma
This is a complete, working example of the answer RemcoW provided:
这是 RemcoW 提供的答案的完整工作示例:
You first have to install a webdriver, e.g. pip install selenium chromedriver_installer
.
您首先必须安装一个网络驱动程序,例如pip install selenium chromedriver_installer
.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# core modules
import codecs
import os
# 3rd party modules
from selenium import webdriver
def get_browser():
"""Get the browser (a "driver")."""
# find the path with 'which chromedriver'
path_to_chromedriver = ('/usr/local/bin/chromedriver')
browser = webdriver.Chrome(executable_path=path_to_chromedriver)
return browser
save_path = os.path.expanduser('~')
file_name = 'index.html'
browser = get_browser()
url = "https://martin-thoma.com/"
browser.get(url)
complete_name = os.path.join(save_path, file_name)
file_object = codecs.open(complete_name, "w", "utf-8")
html = browser.page_source
file_object.write(html)
browser.close()