使用带有 chromedriver 的 Selenium Python 截取整页屏幕截图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41721734/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Take screenshot of full page with Selenium Python with chromedriver
提问by ihightower
After trying out various approaches... I have stumbled upon this page to take full-page screenshot with chromedriver, selenium and python.
在尝试了各种方法之后……我偶然发现了这个页面,用 chromedriver、selenium 和 python 截取了整页截图。
The original code is here. (and I copy the code in this posting below)
原始代码在这里。(我复制了下面这篇文章中的代码)
It uses PIL and it works great! However, there is one issue... which is it captures fixed headers and repeats for the whole page and also misses some parts of the page during page change. sample url to take a screenshot:
它使用 PIL,效果很好!但是,有一个问题......它捕获固定的标题并在整个页面上重复,并且在页面更改期间也错过了页面的某些部分。截取屏幕截图的示例网址:
http://www.w3schools.com/js/default.asp
http://www.w3schools.com/js/default.asp
How to avoid the repeated headers with this code... Or is there any better option which uses python only...( i don't know java and do not want to use java).
如何避免使用此代码重复标题......或者有没有更好的选择只使用python...... (我不知道java并且不想使用java)。
Please see the screenshot of the current result and sample code below.
请参阅下面的当前结果和示例代码的屏幕截图。
test.py
测试文件
"""
This script uses a simplified version of the one here:
https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/
It contains the *crucial* correction added in the comments by Jason Coutu.
"""
import sys
from selenium import webdriver
import unittest
import util
class Test(unittest.TestCase):
""" Demonstration: Get Chrome to generate fullscreen screenshot """
def setUp(self):
self.driver = webdriver.Chrome()
def tearDown(self):
self.driver.quit()
def test_fullpage_screenshot(self):
''' Generate document-height screenshot '''
#url = "http://effbot.org/imagingbook/introduction.htm"
url = "http://www.w3schools.com/js/default.asp"
self.driver.get(url)
util.fullpage_screenshot(self.driver, "test.png")
if __name__ == "__main__":
unittest.main(argv=[sys.argv[0]])
util.py
实用程序
import os
import time
from PIL import Image
def fullpage_screenshot(driver, file):
print("Starting chrome full page screenshot workaround ...")
total_width = driver.execute_script("return document.body.offsetWidth")
total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
viewport_width = driver.execute_script("return document.body.clientWidth")
viewport_height = driver.execute_script("return window.innerHeight")
print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))
rectangles = []
i = 0
while i < total_height:
ii = 0
top_height = i + viewport_height
if top_height > total_height:
top_height = total_height
while ii < total_width:
top_width = ii + viewport_width
if top_width > total_width:
top_width = total_width
print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))
rectangles.append((ii, i, top_width,top_height))
ii = ii + viewport_width
i = i + viewport_height
stitched_image = Image.new('RGB', (total_width, total_height))
previous = None
part = 0
for rectangle in rectangles:
if not previous is None:
driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))
time.sleep(0.2)
file_name = "part_{0}.png".format(part)
print("Capturing {0} ...".format(file_name))
driver.get_screenshot_as_file(file_name)
screenshot = Image.open(file_name)
if rectangle[1] + viewport_height > total_height:
offset = (rectangle[0], total_height - viewport_height)
else:
offset = (rectangle[0], rectangle[1])
print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))
stitched_image.paste(screenshot, offset)
del screenshot
os.remove(file_name)
part = part + 1
previous = rectangle
stitched_image.save(file)
print("Finishing chrome full page screenshot workaround...")
return True
采纳答案by lizisong1988
How it works: set browser height as longest as you can...
工作原理:尽可能将浏览器高度设置为最长...
#coding=utf-8
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def test_fullpage_screenshot(self):
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--start-maximized')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("yoururlxxx")
time.sleep(2)
#the element with longest height on page
ele=driver.find_element("xpath", '//div[@class="react-grid-layout layout"]')
total_height = ele.size["height"]+1000
driver.set_window_size(1920, total_height) #the trick
time.sleep(2)
driver.save_screenshot("screenshot1.png")
driver.quit()
if __name__ == "__main__":
test_fullpage_screenshot()
回答by Javed Karim
element = driver.find_element_by_tag_name('body')
element_png = element.screenshot_as_png
with open("test2.png", "wb") as file:
file.write(element_png)
This works for me. It saves the entire page as screenshot. For more information you can read up the api docs: http://selenium-python.readthedocs.io/api.html
这对我有用。它将整个页面保存为屏幕截图。有关更多信息,您可以阅读 api 文档:http: //selenium-python.readthedocs.io/api.html
回答by Acumenus
This answer improves upon prior answers by am05mhzand Javed Karim.
这个答案改进了am05mhz和Javed Karim之前的答案。
It assumes headless mode, and that a window-size option was not initially set. Before calling this function, ensure the page has loaded fully or sufficiently.
它假定无头模式,并且最初未设置窗口大小选项。在调用此函数之前,请确保页面已完全加载或充分加载。
It attempts to set the width and height both to what is necessary. The screenshot of the entire page can sometimes include a needless vertical scrollbar. One way to generally avoid the scrollbar is by taking a screenshot of the body element instead. After saving a screenshot, it reverts the size to what it was originally, failing which the size for the next screenshot may not set correctly.
它尝试将宽度和高度都设置为必要的值。整个页面的屏幕截图有时会包含一个不必要的垂直滚动条。通常避免滚动条的一种方法是截取 body 元素的屏幕截图。保存屏幕截图后,它会将大小恢复为原来的大小,否则下一个屏幕截图的大小可能无法正确设置。
Ultimately this technique may still not work perfectly well for some examples.
最终,对于某些示例,这种技术可能仍然不能很好地工作。
def save_screenshot(driver: webdriver.Chrome, path: str = '/tmp/screenshot.png') -> None:
# Ref: https://stackoverflow.com/a/52572919/
original_size = driver.get_window_size()
required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
driver.set_window_size(required_width, required_height)
# driver.save_screenshot(path) # has scrollbar
driver.find_element_by_tag_name('body').screenshot(path) # avoids scrollbar
driver.set_window_size(original_size['width'], original_size['height'])
If using Python older than 3.6, remove the type annotations from the function definition.
如果使用早于 3.6 的 Python,请从函数定义中删除类型注释。
回答by alexalex
Screenshots are limited to the viewport but you can get around this by capturing the body
element, as the webdriver will capture the entire element even if it is larger than the viewport. This will save you having to deal with scrolling and stitching images, however you might see problems with footer position (like in the screenshot below).
屏幕截图仅限于视口,但您可以通过捕获body
元素来解决这个问题,因为即使元素大于视口,webdriver 也会捕获整个元素。这将使您不必处理滚动和拼接图像,但是您可能会看到页脚位置问题(如下面的屏幕截图所示)。
Tested on Windows 8 and Mac High Sierra with Chrome Driver.
使用 Chrome 驱动程序在 Windows 8 和 Mac High Sierra 上测试。
from selenium import webdriver
url = 'https://stackoverflow.com/'
path = '/path/to/save/in/scrape.png'
driver = webdriver.Chrome()
driver.get(url)
el = driver.find_element_by_tag_name('body')
el.screenshot(path)
driver.quit()
Returns: (full size: https://i.stack.imgur.com/ppDiI.png)
返回:(全尺寸:https: //i.stack.imgur.com/ppDiI.png)
回答by ihightower
After knowing the approach of @Moshisho.
在了解@Moshisho 的做法后。
My full standalone working script is... (added sleep 0.2 after each scroll and position)
我的完整独立工作脚本是...(在每次滚动和定位后添加 sleep 0.2)
import sys
from selenium import webdriver
import util
import os
import time
from PIL import Image
def fullpage_screenshot(driver, file):
print("Starting chrome full page screenshot workaround ...")
total_width = driver.execute_script("return document.body.offsetWidth")
total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
viewport_width = driver.execute_script("return document.body.clientWidth")
viewport_height = driver.execute_script("return window.innerHeight")
print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))
rectangles = []
i = 0
while i < total_height:
ii = 0
top_height = i + viewport_height
if top_height > total_height:
top_height = total_height
while ii < total_width:
top_width = ii + viewport_width
if top_width > total_width:
top_width = total_width
print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))
rectangles.append((ii, i, top_width,top_height))
ii = ii + viewport_width
i = i + viewport_height
stitched_image = Image.new('RGB', (total_width, total_height))
previous = None
part = 0
for rectangle in rectangles:
if not previous is None:
driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
time.sleep(0.2)
driver.execute_script("document.getElementById('topnav').setAttribute('style', 'position: absolute; top: 0px;');")
time.sleep(0.2)
print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))
time.sleep(0.2)
file_name = "part_{0}.png".format(part)
print("Capturing {0} ...".format(file_name))
driver.get_screenshot_as_file(file_name)
screenshot = Image.open(file_name)
if rectangle[1] + viewport_height > total_height:
offset = (rectangle[0], total_height - viewport_height)
else:
offset = (rectangle[0], rectangle[1])
print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))
stitched_image.paste(screenshot, offset)
del screenshot
os.remove(file_name)
part = part + 1
previous = rectangle
stitched_image.save(file)
print("Finishing chrome full page screenshot workaround...")
return True
driver = webdriver.Chrome()
''' Generate document-height screenshot '''
url = "http://effbot.org/imagingbook/introduction.htm"
url = "http://www.w3schools.com/js/default.asp"
driver.get(url)
fullpage_screenshot(driver, "test1236.png")
回答by jeremie
Not sure if people are still having this issue. I've done a small hack that works pretty well and that plays nicely with dynamic zones. Hope it helps
不确定人们是否仍然有这个问题。我做了一个小技巧,效果很好,并且可以很好地与动态区域配合使用。希望能帮助到你
# 1. get dimensions
browser = webdriver.Chrome(chrome_options=options)
browser.set_window_size(default_width, default_height)
browser.get(url)
time.sleep(sometime)
total_height = browser.execute_script("return document.body.parentNode.scrollHeight")
browser.quit()
# 2. get screenshot
browser = webdriver.Chrome(chrome_options=options)
browser.set_window_size(default_width, total_height)
browser.get(url)
browser.save_screenshot(screenshot_path)
回答by A.Minachev
I changed code for Python 3.6, maybe it will be useful for someone:
我更改了 Python 3.6 的代码,也许它对某人有用:
from selenium import webdriver
from sys import stdout
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import unittest
#from Login_Page import Login_Page
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from io import BytesIO
from PIL import Image
def testdenovoUIavailable(self):
binary = FirefoxBinary("C:\Mozilla Firefox\firefox.exe")
self.driver = webdriver.Firefox(firefox_binary=binary)
verbose = 0
#open page
self.driver.get("http://yandex.ru")
#hide fixed header
#js_hide_header=' var x = document.getElementsByClassName("topnavbar-wrapper ng-scope")[0];x[\'style\'] = \'display:none\';'
#self.driver.execute_script(js_hide_header)
#get total height of page
js = 'return Math.max( document.body.scrollHeight, document.body.offsetHeight, document.documentElement.clientHeight, document.documentElement.scrollHeight, document.documentElement.offsetHeight);'
scrollheight = self.driver.execute_script(js)
if verbose > 0:
print(scrollheight)
slices = []
offset = 0
offset_arr=[]
#separate full screen in parts and make printscreens
while offset < scrollheight:
if verbose > 0:
print(offset)
#scroll to size of page
if (scrollheight-offset)<offset:
#if part of screen is the last one, we need to scroll just on rest of page
self.driver.execute_script("window.scrollTo(0, %s);" % (scrollheight-offset))
offset_arr.append(scrollheight-offset)
else:
self.driver.execute_script("window.scrollTo(0, %s);" % offset)
offset_arr.append(offset)
#create image (in Python 3.6 use BytesIO)
img = Image.open(BytesIO(self.driver.get_screenshot_as_png()))
offset += img.size[1]
#append new printscreen to array
slices.append(img)
if verbose > 0:
self.driver.get_screenshot_as_file('screen_%s.jpg' % (offset))
print(scrollheight)
#create image with
screenshot = Image.new('RGB', (slices[0].size[0], scrollheight))
offset = 0
offset2= 0
#now glue all images together
for img in slices:
screenshot.paste(img, (0, offset_arr[offset2]))
offset += img.size[1]
offset2+= 1
screenshot.save('test.png')
回答by Moshisho
You can achieve this by changing the CSS of the header before the screenshot:
您可以通过在屏幕截图之前更改标题的 CSS 来实现此目的:
topnav = driver.find_element_by_id("topnav")
driver.execute_script("arguments[0].setAttribute('style', 'position: absolute; top: 0px;')", topnav)
EDIT: Put this line after your window scroll:
编辑:在窗口滚动后放置此行:
driver.execute_script("document.getElementById('topnav').setAttribute('style', 'position: absolute; top: 0px;');")
So in your util.pyit will be:
所以在你的util.py 中,它将是:
driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
driver.execute_script("document.getElementById('topnav').setAttribute('style', 'position: absolute; top: 0px;');")
If the site is using the header
tag, you can do it with find_element_by_tag_name("header")
如果网站正在使用该header
标签,您可以使用find_element_by_tag_name("header")
回答by Vali
Why not just getting the width and height of the page and then resize the driver? So will be something like this
为什么不直接获取页面的宽度和高度,然后调整驱动程序的大小?所以会是这样的
total_width = driver.execute_script("return document.body.offsetWidth")
total_height = driver.execute_script("return document.body.scrollHeight")
driver.set_window_size(total_width, total_height)
driver.save_screenshot("SomeName.png")
This is going to make a screenshot of your entire page without the need to merge together different pieces.
这将制作整个页面的屏幕截图,而无需将不同的部分合并在一起。
回答by Klaidonis
The key is to turn on the headless
mode!
No stitching required and no need for loading the page twice.
关键是要开启headless
模式!无需拼接,无需加载页面两次。
Full working code:
完整的工作代码:
URL = 'http://www.w3schools.com/js/default.asp'
options = webdriver.ChromeOptions()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get(URL)
S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)
driver.set_window_size(S('Width'),S('Height')) # May need manual adjustment
driver.find_element_by_tag_name('body').screenshot('web_screenshot.png')
driver.quit()
This is practically the same code as postedby @Acumenuswith slight improvements.
这实际上与@Acumenus发布的代码相同,但略有改进。
Summary of my findings
我的发现总结
I decided to post this anyway because I did not find an explanation about what is happening when the headless
mode is turned off (the browser is displayed) for screenshot taking purposes.
As I tested (with Chrome WebDriver), if the headless
mode is turned on, the screenshot is saved as desired. However, if the headless
mode is turned off, the saved screenshot has approximately the correct width and height, but the outcome varies case-by-case. Usually, the upper part of the page which is visible by the screen is saved, but the rest of the image is just plain white. There was also a case with trying to save this Stack Overflow thread by using the above link; even the upper part was not saved which interestingly now was transparent while the rest still white. The last case I noticed was only once with the given W3Schoolslink; there where no white parts but the upper part of the page repeated until the end, including the header.
无论如何,我决定发布此内容,因为我没有找到有关headless
关闭模式(显示浏览器)时发生的情况的解释以进行屏幕截图。正如我测试的那样(使用 Chrome WebDriver),如果headless
模式打开,屏幕截图会根据需要保存。但是,如果headless
关闭该模式,保存的屏幕截图具有大致正确的宽度和高度,但结果因情况而异。通常,屏幕上可见的页面上部被保存,但图像的其余部分只是纯白色。还有一个案例是尝试使用上面的链接来保存这个 Stack Overflow 线程;甚至上半部分也没有保存,有趣的是现在是透明的,而其余部分仍然是白色的。我注意到的最后一个案例只有一次使用给定的W3Schools链接;那里没有白色部分,但页面的上部重复到最后,包括标题。
I hope this will help for many of those who for some reasonare not getting the expected result as I did not see anyone explicitly explaining about the requirement of headless
mode with this simple approach.
Only when I discovered the solution to this problem myself, I found a postby @vc2279mentioning that the window of a headless browser can be set to any size (which seems to be true for the opposite case too). Although, the solution in my post improves upon that that it does not require repeated browser/driver opening or page reloading.
我希望这对许多由于某种原因没有得到预期结果的人有所帮助,因为我没有看到有人headless
用这种简单的方法明确解释模式的要求。只有当我发现了解决这个问题我自己,我发现了一个帖子由@ vc2279提的是一具无头的浏览器窗口中可以设置为任意大小(这似乎是相反的情况也是如此)。虽然,我帖子中的解决方案改进了它不需要重复打开浏览器/驱动程序或重新加载页面。
Further suggestions
进一步的建议
If for some pages it does not work for you, I suggest trying to add time.sleep(seconds)
before getting the size of the page. Another case would be if the page requires scrolling until the bottom to load further content, which can be solved by the scheight
method from this post:
如果对于某些页面它不适合您,我建议time.sleep(seconds)
在获取页面大小之前尝试添加。另一种情况是页面需要滚动到底部以加载更多内容,这可以通过scheight
这篇文章中的方法解决:
scheight = .1
while scheight < 9.9:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight)
scheight += .01
Also, note that for some pages the content may not be in any of the top-level HTML tags like <html>
or <body>
, for example, YouTubeuses <ytd-app>
tag.
As a last note, I found one page that "returned" a screenshot still with the horizontal scrollbar, the size of the window needed manual adjustment, i.e., the image width needed to be increased by 18 pixels, like so: S('Width')+18
.
另请注意,对于某些页面,内容可能不在任何顶级 HTML 标记中,例如<html>
或<body>
,例如,YouTube使用<ytd-app>
标记。最后一点,我发现有一个页面“返回”了一个仍然带有水平滚动条的截图,窗口的大小需要手动调整,即图像宽度需要增加18个像素,如下所示:S('Width')+18
。