如何在 python 中将 JavaScript HTML 渲染为 HTML?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29404856/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-28 10:29:11  来源:igfitidea点击:

How can I render JavaScript HTML to HTML in python?

javascriptpythonweb-scraping

提问by user3928006

I have looked around and only found solutions that render a URL to HTML. However I need a way to be able to render a webpage (That I already have, and that has JavaScript) to proper HTML.

我环顾四周,只找到了将 URL 呈现为 HTML 的解决方案。但是,我需要一种方法来将网页(我已经拥有,并且具有 JavaScript)呈现为正确的 HTML。

Want: Webpage (with JavaScript) ---> HTML

想要:网页(使用 JavaScript)---> HTML

Not: URL --> Webpage (with JavaScript) ---> HTML

不是: URL --> 网页(使用 JavaScript) ---> HTML

I couldn't figure out how to make the other code work the way I wanted.

我不知道如何让其他代码按照我想要的方式工作。

This is the code I was using that renders URLs: http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/

这是我使用的呈现 URL 的代码:http: //webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/

For clarity, the code above takes a URL of a webpage that has some parts of the page rendered by JavaScript, so if I scrape the page normally using say urllib2 then I won't get all the links etc that are rendered as after the JavaScript.

为清楚起见,上面的代码采用网页的 URL,该网页的某些部分由 JavaScript 呈现,因此如果我通常使用 urllib2 抓取页面,那么我将不会获得在 JavaScript 之后呈现的所有链接等.

However I want to be able to scrape a page, say again with urllib2, and then render that page and get the outcome HTML. (Different to the above code since it takes a URL as it's argument.

但是我希望能够抓取一个页面,用 urllib2 再说一遍,然后呈现该页面并获得结果 HTML。(与上面的代码不同,因为它需要一个 URL 作为它的参数。

Any help is appreciated, thanks guys :)

任何帮助表示赞赏,谢谢大家:)

回答by barak manos

You can pip install seleniumfrom a command line, and then run something like:

您可以pip install selenium从命令行,然后运行类似:

from selenium import webdriver
from urllib2 import urlopen

url = 'http://www.google.com'
file_name = 'C:/Users/Desktop/test.txt'

conn = urlopen(url)
data = conn.read()
conn.close()

file = open(file_name,'wt')
file.write(data)
file.close()

browser = webdriver.Firefox()
browser.get('file:///'+file_name)
html = browser.page_source
browser.quit()

回答by peter wambua

try webdriver.Firefox().get('url')

试试 webdriver.Firefox().get('url')