在 Python 中使用 Selenium 在 Firefox 上保存网页

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37835867/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:59:48  来源:igfitidea点击:

Using Selenium in Python to save a webpage on Firefox

pythonseleniumfirefoxsave-as

提问by Tommy N

I am trying to use Seleniumin Pythonto save webpages on MacOS Firefox.

我正在尝试使用SeleniuminPython将网页保存在MacOS Firefox.

So far, I have managed to click COMMAND + Sto pop up the SAVE AS window. However,

到目前为止,我已经设法点击COMMAND + S弹出SAVE AS window. 然而,

I don't know how to:

我不知道如何:

  1. change the directory of the file,
  2. change the name of the file, and
  3. click the SAVE AS button.
  1. 更改文件目录,
  2. 更改文件名,以及
  3. 单击另存为按钮。

Could someone help?

有人可以帮忙吗?

Below is the code I have use to click COMMAND + S:

下面是我用来点击的代码COMMAND + S

ActionChains(browser).key_down(Keys.COMMAND).send_keys("s").key_up(Keys.COMMAND).perform()

Besides, the reason for me to use this method is that I encounter Unicode Encode Errorwhen I :-

此外,我使用这种方法的原因是当我遇到Unicode 编码错误时:-

  1. write the page_source to a html file and
  2. store scrapped information to a csv file.
  1. 将 page_source 写入 html 文件并
  2. 将报废的信息存储到 csv 文件。

Write to a html file:

写入一个 html 文件:

file_object = open(completeName, "w")
html = browser.page_source
file_object.write(html)
file_object.close() 

Write to a csv file:

写入 csv 文件:

csv_file_write.writerow(to_write)

Error:

错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 1: ordinal not in range(128)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 1: ordinal not in range(128)

回答by misantroop

with open('page.html', 'w') as f:
    f.write(driver.page_source)

回答by RemcoW

What you are trying to achieve is impossible to do with Selenium. The dialog that opens is not something Selenium can interact with.

使用 Selenium 无法实现您想要实现的目标。打开的对话框不是 Selenium 可以与之交互的。

The closes thing you could do is collect the page_sourcewhich gives you the entire HTML of a single page and save this to a file.

您可以做的关闭的事情是收集page_source为您提供单个页面的整个 HTML 并将其保存到文件中。

import codecs

completeName = os.path.join(save_path, file_name)
file_object = codecs.open(completeName, "w", "utf-8")
html = browser.page_source
file_object.write(html)

If you really need to save the entire website you should look into using a tool like AutoIT. This will make it possible to interact with the save dialog.

如果你真的需要保存整个网站,你应该考虑使用像 AutoIT 这样的工具。这将使与保存对话框交互成为可能。

回答by Mobrockers

You cannot interact with system dialogs like save file dialog. If you want to save the page html you can do something like this:

您无法与保存文件对话框等系统对话框进行交互。如果要保存页面 html,可以执行以下操作:

page = driver.page_source
file_ = open('page.html', 'w')
file_.write(page)
file_.close()

回答by Martin Thoma

This is a complete, working example of the answer RemcoW provided:

这是 RemcoW 提供的答案的完整工作示例:

You first have to install a webdriver, e.g. pip install selenium chromedriver_installer.

您首先必须安装一个网络驱动程序,例如pip install selenium chromedriver_installer.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# core modules
import codecs
import os

# 3rd party modules
from selenium import webdriver


def get_browser():
    """Get the browser (a "driver")."""
    # find the path with 'which chromedriver'
    path_to_chromedriver = ('/usr/local/bin/chromedriver')
    browser = webdriver.Chrome(executable_path=path_to_chromedriver)
    return browser


save_path = os.path.expanduser('~')
file_name = 'index.html'
browser = get_browser()

url = "https://martin-thoma.com/"
browser.get(url)

complete_name = os.path.join(save_path, file_name)
file_object = codecs.open(complete_name, "w", "utf-8")
html = browser.page_source
file_object.write(html)
browser.close()