在 javascript 页面中使用 python 请求

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26393231/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:26:32  来源:igfitidea点击:

Using python Requests with javascript pages

pythonweb-scrapingpython-requests

提问by biw

I am trying to use the Requests framework with python (http://docs.python-requests.org/en/latest/) but the page I am trying to get to uses javascript to fetch the info that I want.

我正在尝试将请求框架与 python ( http://docs.python-requests.org/en/latest/)一起使用,但我试图访问的页面使用 javascript 来获取我想要的信息。

I have tried to search on the web for a solution but the fact that I am searching with the keyword javascript most of the stuff I am getting is how to scrape with the javascript language.

我曾尝试在网络上搜索解决方案,但事实上我正在使用关键字 javascript 进行搜索,我得到的大部分内容是如何使用 javascript 语言进行抓取。

Is there anyway to use the requests framework with pages that use javascript?

无论如何,是否可以将请求框架与使用 javascript 的页面一起使用?

采纳答案by sberry

You are going to have to make the same request (using the Requests library) that the javascript is making. You can use any number of tools (including those built into Chrome and Firefox) to inspect the http request that is coming from javascript and simply make this request yourself from Python.

您将不得不发出与 javascript 相同的请求(使用 Requests 库)。您可以使用任意数量的工具(包括 Chrome 和 Firefox 中内置的工具)来检查来自 javascript 的 http 请求,并且只需自己从 Python 发出此请求。

回答by Lil Taco

While Selenium might seem tempting and useful, it has one main problem that can't be fixed: performance. By calculating every single thing a browser does, you will need a lot more power. Even PhantomJS does not compete with a simple request. I recommend that you will only use Selenium when you really need to click buttons. If you only need javascript, I recommend PyQt (check https://www.youtube.com/watch?v=FSH77vnOGqUto learn it).

尽管 Selenium 看起来很诱人且有用,但它有一个无法解决的主要问题:性能。通过计算浏览器所做的每一件事,您将需要更多的功能。甚至 PhantomJS 也无法与简单的请求竞争。我建议您仅在确实需要单击按钮时才使用 Selenium。如果您只需要 javascript,我推荐 PyQt(检查https://www.youtube.com/watch?v=FSH77vnOGqU以了解它)。

However, if you want to use Selenium, I recommend Chrome over PhantomJS. Many users have problems with PhantomJS where a website simply does not work in Phantom. Chrome can be headless (non-graphical) too!

但是,如果你想使用 Selenium,我推荐 Chrome 而不是 PhantomJS。许多用户在使用 PhantomJS 时遇到问题,网站在 Phantom 中根本无法运行。Chrome 也可以是无头的(非图形的)!

First, make sure you have installed ChromeDriver, which Selenium depends on for using Google Chrome.

首先,确保您已经安装了 ChromeDriver,Selenium 依赖它来使用 Google Chrome。

Then, make sure you have Google Chrome of version 60 or higher by checking it in the URL chrome://settings/help

然后,通过在 URL chrome://settings/help 中检查它来确保您拥有版本 60 或更高版本的 Google Chrome

Now, all you need to do is the following code:

现在,您需要做的就是以下代码:

from selenium.webdriver.chrome.options import Options
from selenium import webdriver

chrome_options = Options()
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(chrome_options=chrome_options)

If you do not know how to use Selenium, here is a quick overview:

如果您不知道如何使用 Selenium,这里是一个快速概览:

driver.get("https://www.google.com") #Browser goes to google.com

Finding elements: Use either the ELEMENTS or ELEMENT method. Examples:

查找元素:使用 ELEMENTS 或 ELEMENT 方法。例子:

driver.find_element_by_css_selector("div.logo-subtext") #Find your country in Google. (singular)
  • driver.find_element(s)_by_css_selector(css_selector) # Every element that matches this CSS selector
  • driver.find_element(s)_by_class_name(class_name) # Every element with the following class
  • driver.find_element(s)_by_id(id) # Every element with the following ID
  • driver.find_element(s)_by_link_text(link_text) # Every with the full link text
  • driver.find_element(s)_by_partial_link_text(partial_link_text) # Every with partial link text.
  • driver.find_element(s)_by_name(name) # Every element where name=argument
  • driver.find_element(s)_by_tag_name(tag_name) # Every element with the tag name argument
  • driver.find_element(s)_by_css_selector(css_selector) # 匹配这个CSS选择器的每个元素
  • driver.find_element(s)_by_class_name(class_name) # 具有以下类的每个元素
  • driver.find_element(s)_by_id(id) # 具有以下ID的每个元素
  • driver.find_element(s)_by_link_text(link_text) # 每一个都带有完整的链接文本
  • driver.find_element(s)_by_partial_link_text(partial_link_text) # 每个都有部分链接文本。
  • driver.find_element(s)_by_name(name) # name=argument 的每个元素
  • driver.find_element(s)_by_tag_name(tag_name) # 带有标签名称参数的每个元素

Ok! I found an element (or elements list). But what do I do now?

好的!我找到了一个元素(或元素列表)。但是我现在该怎么办?

Here are the methods you can do on an element elem:

以下是您可以对元素elem执行的方法:

  • elem.tag_name # Could return button in a .
  • elem.get_attribute("id") # Returns the ID of an element.
  • elem.text # The inner text of an element.
  • elem.clear() # Clears a text input.
  • elem.is_displayed() # True for visible elements, False for invisible elements.
  • elem.is_enabled() # True for an enabled input, False otherwise.
  • elem.is_selected() # Is this radio button or checkbox element selected?
  • elem.location # A dictionary representing the X and Y location of an element on the screen.
  • elem.click() # Click elem.
  • elem.send_keys("thelegend27") # Type thelegend27 into elem(useful for text inputs)
  • elem.submit() # Submit the form in which elemtakes part.
  • elem.tag_name # 可以在 .
  • elem.get_attribute("id") # 返回元素的 ID。
  • elem.text # 元素的内部文本。
  • elem.clear() # 清除文本输入。
  • elem.is_displayed() # 可见元素为真,不可见元素为假。
  • elem.is_enabled() # 启用输入时为真,否则为假。
  • elem.is_selected() # 这个单选按钮或复选框元素是否被选中?
  • elem.location # 表示屏幕上元素的 X 和 Y 位置的字典。
  • elem.click() # 点击elem
  • elem.send_keys("thelegend27") # 在elem 中输入thelegend27 (用于文本输入)
  • elem.submit() # 提交elem参与的表单。

Special commands:

特殊命令:

  • driver.back() # Click the Back button.
  • driver.forward() # Click the Forward button.
  • driver.refresh() # Refresh the page.
  • driver.quit() # Close the browser including all the tabs.
  • foo = driver.execute_script("return 'hello';") # Execute javascript (COULD TAKE RETURN VALUES!)
  • driver.back() # 点击返回按钮。
  • driver.forward() # 点击转发按钮。
  • driver.refresh() # 刷新页面。
  • driver.quit() # 关闭浏览器,包括所有选项卡。
  • foo = driver.execute_script("return 'hello';") # 执行 javascript(可以获取返回值!)

回答by marvb

Good news: there is now a requests module that supports javascript: https://pypi.org/project/requests-html/

好消息:现在有一个支持 javascript 的请求模块:https: //pypi.org/project/requests-html/

from requests_html import HTMLSession

session = HTMLSession()

r = session.get('http://www.yourjspage.com')

r.html.render()  # this call executes the js in the page

As a bonus this wraps BeautifulSoup, I think, so you can do things like

作为奖励,这个包装BeautifulSoup,我想,所以你可以做这样的事情

r.html.find('#myElementID').text

which returns the content of the HTML element as you'd expect.

它按照您的预期返回 HTML 元素的内容。

回答by Pulia Zlaya

its a wrapper around pyppeteer or smth? :( i thought its something different

它是 pyppeteer 或 smth 的包装吗?:(我认为它有些不同

    @property
    async def browser(self):
        if not hasattr(self, "_browser"):
            self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, args=self.__browser_args)

        return self._browser