java 如何截取网页截图？

Question

提问by Felipe Dias

I am using this code below but the image generated is broken. I think probably it is because of the renderization options. Does anybody know what is happening?

我在下面使用此代码，但生成的图像已损坏。我想可能是因为渲染选项。有人知道发生了什么吗？

package webpageprinter;

import java.net.URL;
import java.awt.image.BufferedImage;
import javax.imageio.ImageIO;
import java.beans.PropertyChangeListener;
import java.beans.PropertyChangeEvent;
import javax.swing.text.html.*;
import java.awt.*;
import javax.swing.*;
import java.io.*;

public class WebPagePrinter {
private BufferedImage image = null;

public BufferedImage Download(String webpageurl) {
try
{
    URL url = new URL(webpageurl);
    final JEditorPane jep = new JEditorPane();
    jep.setContentType("text/html");
    ((HTMLDocument)jep.getDocument()).setBase(url);
    jep.setEditable(false);
    jep.setBounds(0,0,1024,768);
    jep.addPropertyChangeListener("page",new
    PropertyChangeListener() {
                @Override
    public void propertyChange(PropertyChangeEvent e) {
    try
    {
        image = new
        BufferedImage(1024,768,BufferedImage.TYPE_INT_RGB );
        Graphics g = image.getGraphics();
        Graphics2D graphics = (Graphics2D) g;
        graphics.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
        jep.paint(graphics);
        ImageIO.write(image,"png",new File("C:/webpage.png"));
    }
    catch (Exception re)
    {
        re.printStackTrace();
    }
    }});
    jep.setPage(url);

}
catch (Exception e)
{
e.printStackTrace();
}
return image;
}

    public static void main(String[] args) {

        new WebPagePrinter().Download("http://www.google.com");

    }
}

Answer 1

回答by Andrew Thompson

I think there are 3 problems and one fragility in that code:

我认为该代码存在 3 个问题和一个脆弱性：

Problems

问题

JEditorPanewas never intended to be a browser.
setPage(URL)loads asynchronously. It is necessary to add a listener to determine when the page has loaded.
You might find some sites automatically refuse connections to Java clients.

JEditorPane从未打算成为浏览器。
setPage(URL)异步加载。有必要添加一个侦听器来确定页面何时加载。
您可能会发现某些站点会自动拒绝与 Java 客户端的连接。

Fragility

脆弱性

The fragile nature is included with the call to setBounds(). Use layouts.

对的调用包含了脆弱的性质setBounds()。使用布局。

Image at 400x600

400x600 的图像

Google screen shot

谷歌屏幕截图

But looking at this image, it seems 3 does not apply here, 2 is not the problem. It comes down to point 1. JEditorPanewas never intended as a browsing component. Those random characters at the bottom are JavaScript that the JEP is not only not scripting, but then improperly displaying in the page.

但是看这张图片，似乎3在这里不适用，2不是问题。归结为第 1 点。 JEditorPane从来没有打算作为浏览组件。底部的那些随机字符是 JavaScript，JEP 不仅没有编写脚本，而且还不能正确显示在页面中。

Answer 2

回答by hbhakhra

You can do an entire screen capture using Java Robot (API Here).

您可以使用 Java Robot (API Here)完成整个屏幕截图。

import java.awt.AWTException;
import java.awt.Rectangle;
import java.awt.Robot;
import java.awt.Toolkit;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

import javax.imageio.ImageIO;

public class RobotExp {

    public static void main(String[] args) {

        try {

            Robot robot = new Robot();
            // Capture the screen shot of the area of the screen defined by the rectangle
            BufferedImage bi=robot.createScreenCapture(new Rectangle(Toolkit.getDefaultToolkit().getScreenSize()));
            ImageIO.write(bi, "jpg", new File("C:/imageTest.jpg"));

        } catch (AWTException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

This example was found here.With some modifications by me.

这个例子是在这里找到的。经过我的一些修改。

Answer 3

回答by Janning

I had best results with Selenium WebDriver using a VirtualFramebuffer and Firefox Binary. This is tested under ubuntu. You need to have xvfb and Firefox installed. Advantage: You are running a real browser, so the screenshot looks like a real screenshot in a real browser.

我在使用 VirtualFramebuffer 和 Firefox Binary 的 Selenium WebDriver 上获得了最好的结果。这是在ubuntu下测试的。您需要安装 xvfb 和 Firefox。优点：您运行的是真实浏览器，因此屏幕截图看起来像真实浏览器中的真实屏幕截图。

First install Firefox and virtual framebuffer:

首先安装 Firefox 和虚拟帧缓冲区：

aptitude install xvfb firefox

aptitude 安装 xvfb 火狐

Compile and run this class, open /tmp/screenshot.png afterwards

编译运行这个类，之后打开/tmp/screenshot.png

import java.io.File;
import java.io.IOException;

import org.apache.commons.io.FileUtils;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxBinary;
import org.openqa.selenium.firefox.FirefoxDriver;

public class CaptureScreenshotTest
{
    private static int DISPLAY_NUMBER=99;
    private static String XVFB="/usr/bin/Xvfb";
    private static String XVFB_COMMAND= XVFB + " :" + DISPLAY_NUMBER;
    private static String URL="http://www.google.com/";
    private static String RESULT_FILENAME="/tmp/screenshot.png";

    public static void main ( String[] args ) throws IOException
    {
        Process p = Runtime.getRuntime().exec(XVFB_COMMAND);
        FirefoxBinary firefox = new FirefoxBinary();
        firefox.setEnvironmentProperty("DISPLAY", ":" + DISPLAY_NUMBER);
        WebDriver driver = new FirefoxDriver(firefox, null);
        driver.get(URL);
        File scrFile = ( (TakesScreenshot) driver ).getScreenshotAs(OutputType.FILE);
        FileUtils.copyFile(scrFile, new File(RESULT_FILENAME));
        driver.close();
        p.destroy();
    }
}

Answer 4

回答by Michael Borgwardt

Your problem is that you're using Java's JEditorPaneto render the webpage, which has a very limited HTML rendering engine. It is simply not able to display more complex webpages as well as a modern Browser.

您的问题是您使用 JavaJEditorPane来呈现网页，它的 HTML 呈现引擎非常有限。它根本无法显示更复杂的网页以及现代浏览器。

If you need to produce screenshots of correctly rendered complex webpages using Java, the best way is probably to use Selenium to control a real browserlike Firefox.

如果您需要使用 Java 生成正确呈现的复杂网页的屏幕截图，最好的方法可能是使用 Selenium 来控制像 Firefox 这样的真实浏览器。

Answer 5

回答by fvu

The javadoc states

该javadoc的状态

HTML text. The kit used in this case is the class javax.swing.text.html.HTMLEditorKit which provides HTML 3.2 support.

HTML 文本。在这种情况下使用的工具包是类 javax.swing.text.html.HTMLEditorKit，它提供 HTML 3.2 支持。

Probably that explains why the page looks a bit broken, as nowadays pages are mostly using HTML4, 5 or XHTML.....

可能这就解释了为什么页面看起来有点破损，因为现在页面大多使用 HTML4、5 或 XHTML.....

There's an article here on SO regarding Java browser components: Best Java/Swing browser component?

这里有一篇关于 Java 浏览器组件的 SO 文章：最佳 Java/Swing 浏览器组件？

Answer 6

回答by joostschouten

Have a look at flying-saucer. Great for generating images and pdf's from HTML pages.

看看飞碟。非常适合从 HTML 页面生成图像和 pdf。

java 如何截取网页截图？

提问by Felipe Dias

回答by Andrew Thompson

Problems

问题

Fragility

脆弱性

Image at 400x600

400x600 的图像

回答by hbhakhra

回答by Janning

回答by Michael Borgwardt

回答by fvu

回答by joostschouten

相关推荐

最近更新

标签

java 如何截取网页截图？

提问by Felipe Dias

回答by Andrew Thompson

Problems

问题

Fragility

脆弱性

Image at 400x600

400x600 的图像

回答by hbhakhra

回答by Janning

回答by Michael Borgwardt

回答by fvu

回答by joostschouten

相关推荐

Java：打印出 args 数组中的所有整数

java 是否可以使用 CXF 生成可序列化的类？

java 向 TextField 添加提示

将 JPanel 上的组件放在前面（Java）

相关推荐

最近更新

标签