Python WebDriver 如何打印整页源代码 (html)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27411915/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python WebDriver how to print whole page source (html)
提问by wmarchewka
I'm using Python 2.7 with Selenium WebDriver.
My question is how to print whole page source with print
method.
There is webdriver method page_source
but it returns WebDriver and I don't know how to convert it to String or just print it in terminal
我将 Python 2.7 与 Selenium WebDriver 一起使用。我的问题是如何使用print
方法打印整页源代码。有 webdriver 方法,page_source
但它返回 WebDriver,我不知道如何将其转换为 String 或只是在终端中打印它
采纳答案by alecxe
.page_source
on a webdriver
instance is what you need:
.page_source
在一个webdriver
实例上是你需要的:
>>> from selenium import webdriver
>>> driver = webdriver.Firefox()
>>> driver.get('http://google.com')
>>> print(driver.page_source)
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" itemtype="http://schema.org/WebPage" itemscope=""><head><meta name="descri
...
:before,.vscl.vslru div.vspib{top:-4px}</style></body></html>
回答by Myke
You can also get the HTML page source without using a browser. The requests module allows you to do that.
您还可以在不使用浏览器的情况下获取 HTML 页面源代码。requests 模块允许您这样做。
import requests
res = requests.get('https://google.com')
res.raise_for_status() # this line trows an exception if an error on the
# connection to the page occurs.
print(res.text)