如何在 python selenium-webdriver 中抓取标题

Question

提问by David542

I am trying to grab the headers in selenium webdriver. Something similar to the following:

我正在尝试获取 selenium webdriver 中的标头。类似于以下内容：

>>> import requests
>>> res=requests.get('http://google.com')
>>> print res.headers

I need to use the Chromewebdriver because it supports flash and some other things that I need to test a web page. Here is what I have so far in Selenium:

我需要使用Chromewebdriver，因为它支持 Flash 和其他一些我需要测试网页的东西。到目前为止，这是我在 Selenium 中所拥有的：

from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://login.comcast.net/login?r=comcast.net&s=oauth&continue=https%3A%2F%2Flogin.comcast.net%2Foauth%2Fauthorize%3Fclient_id%3Dxtv-account-selector%26redirect_uri%3Dhttps%3A%2F%2Fxtv-pil.xfinity.com%2Fxtv-authn%2Fxfinity-cb%26response_type%3Dcode%26scope%3Dopenid%2520https%3A%2F%2Flogin.comcast.net%2Fapi%2Flogin%26state%3Dhttps%3A%2F%2Ftv.xfinity.com%2Fpartner-success.html%26prompt%3Dlogin%26response%3D1&reqId=18737431-624b-44cb-adf0-2a85d91bd662&forceAuthn=1&client_id=xtv-account-selector')
driver.find_element_by_css_selector('#user').send_keys('[email protected]')
driver.find_element_by_css_selector('#passwd').send_keys('XXY')
driver.find_element_by_css_selector('#passwd').submit()
print driver.headers ### How to do this?

I have seen some other answers that recommend running an entire selenium server to get this information (https://github.com/derekargueta/selenium-profiler). How would I get it using something similar to the above with Webdriver?

我看过其他一些建议运行整个 selenium 服务器来获取此信息的答案（https://github.com/derekargueta/selenium-profiler）。我将如何使用与 Webdriver 类似的东西来获得它？

Answer 1

采纳答案by elethan

Unfortunately, you cannotget this information from the Selenium webdriver, nor will you be able to any time in the near future it seems. An excerpt from a very long conversation on the subject:

不幸的是，您无法从 Selenium webdriver 获取此信息，而且在不久的将来您似乎也无法获得此信息。关于这个主题的一段很长的谈话摘录：

This feature isn't going to happen.

这个功能不会发生。

The gist of the main reason being, from what I gather from the discussion, that the webdriver is meant for "driving the browser", and extending the API beyond that primary goal will, in the opinion of the developers, cause the overall quality and reliability of the API to suffer.

根据我从讨论中收集到的信息，主要原因的要点在于，webdriver 旨在“驱动浏览器”，并且在开发人员看来，将 API 扩展到该主要目标之外将导致整体质量和API 的可靠性受到影响。

One potential workaround that I have seen suggested in a number of places, including the conversation linked above, is to use BrowserMob Proxy, which can be used to capture HTTP content, and can be used with selenium- though the linked example does not use the Python selenium API. It does seem that there is a Python wrapper for BrowserMob Proxy, but I cannot vouch for it's efficacy since I have never used it.

我在许多地方看到的一种潜在解决方法，包括上面链接的对话，是使用BrowserMob Proxy，它可用于捕获 HTTP 内容，并且可以与 selenium 一起使用- 尽管链接的示例不使用Python 硒 API。似乎BrowserMob Proxy有一个 Python 包装器，但我无法保证它的功效，因为我从未使用过它。

Answer 2

回答by Rafael Ribeiro

You could try Mobilenium, a python package (still in development) that binds BrowserMob Proxy and Selenium.

您可以尝试Mobilenium，这是一个绑定 BrowserMob 代理和 Selenium 的 Python 包（仍在开发中）。

An usage example:

一个使用示例：

>>> from mobilenium import mobidriver
>>>
>>> browsermob_path = 'path/to/browsermob-proxy'
>>> mob = mobidriver.Firefox(browsermob_binary=browsermob_path)
>>> mob.get('http://python-requests.org')
301
>>> mob.response['redirectURL']
'http://docs.python-requests.org'
>>> mob.headers['Content-Type']
'application/json; charset=utf8'
>>> mob.title
'Requests: HTTP for Humans \u2014 Requests 2.13.0 documentation'
>>> mob.find_elements_by_tag_name('strong')[1].text
'Behold, the power of Requests'

Answer 3

回答by J. Does

You can get the header via the log (source from Mma's answer)

您可以通过日志获取标题（来自Mma 的回答）

from selenium import webdriver
import json
driver = webdriver.PhantomJS(executable_path=r"your_path")
har = json.loads(driver.get_log('har')[0]['message']) # get the log
print('headers: ', har['log']['entries'][0]['request']['headers'])

Answer 4

回答by munish

Now, it is very easy i suppose https://pypi.org/project/selenium-wire/it is an extension of selenium. use from seleniumwire import webdriverand proceed as usual.

现在，我想这很容易https://pypi.org/project/selenium-wire/它是 selenium 的扩展。使用from seleniumwire import webdriver并照常进行。

Answer 5

回答by BabyPoopSoup

You can use the JAVASCRIPT built-in method.

您可以使用 JAVASCRIPT 内置方法。

It only can be done once the driver has already been created though.

但是只有在驱动程序已经创建后才能完成。

from selenium import webdriver
driver = webdriver.Chrome()
# Store it in a variable and print the value
agent = driver.execute_script("return navigator.userAgent")
print(agent)
# directly print the value
print(driver.execute_script("return navigator.userAgent"))

如何在 python selenium-webdriver 中抓取标题

提问by David542

采纳答案by elethan

回答by Rafael Ribeiro

回答by J. Does

回答by munish

回答by BabyPoopSoup

相关推荐

最近更新

标签

如何在 python selenium-webdriver 中抓取标题

提问by David542

采纳答案by elethan

回答by Rafael Ribeiro

回答by J. Does

回答by munish

回答by BabyPoopSoup

相关推荐

Python 从pyodbc读取数据到pandas

Python 当熊猫数据框到临时文件 csv 时权限被拒绝

Python 获取列表中的倒数第二个元素

Python 如何计算 Pandas 滚动窗口中的波动率（标准差）

相关推荐

最近更新

标签