Python WebDriver 如何打印整页源代码 (html)

Question

提问by wmarchewka

I'm using Python 2.7 with Selenium WebDriver. My question is how to print whole page source with printmethod. There is webdriver method page_sourcebut it returns WebDriver and I don't know how to convert it to String or just print it in terminal

我将 Python 2.7 与 Selenium WebDriver 一起使用。我的问题是如何使用print方法打印整页源代码。有 webdriver 方法，page_source但它返回 WebDriver，我不知道如何将其转换为 String 或只是在终端中打印它

Answer 1

采纳答案by alecxe

.page_sourceon a webdriverinstance is what you need:

.page_source在一个webdriver实例上是你需要的：

>>> from selenium import webdriver
>>> driver = webdriver.Firefox()
>>> driver.get('http://google.com')
>>> print(driver.page_source)
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" itemtype="http://schema.org/WebPage" itemscope=""><head><meta name="descri
...
:before,.vscl.vslru div.vspib{top:-4px}</style></body></html>

Answer 2

回答by Myke

You can also get the HTML page source without using a browser. The requests module allows you to do that.

您还可以在不使用浏览器的情况下获取 HTML 页面源代码。requests 模块允许您这样做。

 import requests

 res = requests.get('https://google.com')
 res.raise_for_status()  # this line trows an exception if an error on the 
                         # connection to the page occurs. 
 print(res.text)

Python WebDriver 如何打印整页源代码 (html)

提问by wmarchewka

采纳答案by alecxe

回答by Myke

相关推荐

最近更新

标签

Python WebDriver 如何打印整页源代码 (html)

提问by wmarchewka

采纳答案by alecxe

回答by Myke

相关推荐

Python 3：与脚本位于同一目录中的模块：“ImportError：未命名模块”

Python BeatifulSoup4 get_text 仍然有 javascript

Python Pandas 可以绘制日期的直方图吗？

从 Python 中的嵌套列表中删除一列

相关推荐

最近更新

标签