java 从 JTextPane 获取原始文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1859686/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Getting raw text from JTextPane
提问by Romain Linsolas
In my application, I use a JTextPaneto display some log information. As I want to hightlight some specific lines in this text (for example the error messages), I set the contentTypeas "text/html". This way, I can format my text.
在我的应用程序中,我使用 aJTextPane来显示一些日志信息。由于我想突出显示本文中的某些特定行(例如错误消息),我将其设置contentType为“ text/html”。这样,我可以格式化我的文本。
Now, I create a JButton that copies the content of this JTextPaneinto the clipboard. That part is easy, but my problem is that when I call myTextPane.getText(), I get the HTML code, such as :
现在,我创建了一个 JButton,将它的内容复制JTextPane到剪贴板中。这部分很简单,但我的问题是,当我调用 时myTextPane.getText(),我得到了 HTML 代码,例如:
<html>
<head>
</head>
<body>
blabla<br>
<font color="#FFCC66"><b>foobar</b></font><br>
blabla
</body>
</html>
instead of getting only the raw content:
而不是只获取原始内容:
blabla
foobar
blabla
Is there a way to get only the content of my JTextPanein plain text? Or do I need to transform the HTML into raw text by myself?
有没有办法只获取我JTextPane的纯文本内容?还是我需要自己将 HTML 转换为原始文本?
采纳答案by jitter
Based on the accepted answer to: Removing HTML from a Java String
基于接受的答案:Removing HTML from a Java String
MyHtml2Text parser = new MyHtml2Text();
try {
parser.parse(new StringReader(myTextPane.getText()));
} catch (IOException ee) {
//handle exception
}
System.out.println(parser.getText());
Slightly modified version of the Html2Textclass found on the answer I linked to
Html2Text在我链接到的答案中找到的课程的略微修改版本
import java.io.IOException;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
public class MyHtml2Text extends HTMLEditorKit.ParserCallback {
StringBuffer s;
public MyHtml2Text() {}
public void parse(Reader in) throws IOException {
s = new StringBuffer();
ParserDelegator delegator = new ParserDelegator();
delegator.parse(in, this, Boolean.TRUE);
}
public void handleText(char[] text, int pos) {
s.append(text);
s.append("\n");
}
public String getText() {
return s.toString();
}
}
If you need a more fine-grained handling consider implementing more of the interface defined by HTMLEditorKit.ParserCallback
如果您需要更细粒度的处理,请考虑实现更多由 HTMLEditorKit.ParserCallback
回答by camickr
No need to use the ParserCallback. Just use:
无需使用 ParserCallback。只需使用:
textPane.getDocument().getText(0, textPane.getDocument().getLength()) );
回答by Nick Fortescue
You need to do it yourself unfortunately. Imagine if some of the contents was HTML specific, eg images - the text representation is unclear. Include alt text or not for instance.
不幸的是你需要自己做。想象一下,如果某些内容是特定于 HTML 的,例如图像 - 文本表示不清楚。例如,是否包括替代文字。
回答by Andreas Dolk
(Is RegExp allowed? This isn't parsing, isn't it)
(是否允许 RegExp?这不是解析,不是吗)
Take the getText() result and use String.replaceAll() to filter all tags. Than a trim() to remove leading and trailing whitespaces. For the whitespaces between your first and you last 'blabla' I don't see a general solution. Maybe you can spilt the rest around CRLF and trim all Strings again.
获取 getText() 结果并使用 String.replaceAll() 过滤所有标签。比 trim() 删除前导和尾随空格。对于第一个和最后一个 'blabla' 之间的空格,我没有看到通用的解决方案。也许您可以将其余部分洒在 CRLF 周围并再次修剪所有字符串。
(I'm no regexp expert - maybe someone can provide the regexp and earn some reputation ;) )
(我不是正则表达式专家 - 也许有人可以提供正则表达式并赢得一些声誉;))
Edit
编辑
.. I just assumed that you don't use <and >in your text - otherwise it.. say, it's a challenge.
.. 我只是假设你不在你的文本中使用<和>- 否则它......说,这是一个挑战。

