Java RTF 解析器

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17223903/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-01 01:21:36  来源:igfitidea点击:

Java RTF Parser

javaparsingrtf

提问by Mary

Does anyone know of a robust RTF parser I can use in Java? I need to extract plain text, including international text. It would also be nice to extract embedded images and files. It could also be a C++ or other library that I can easily call, or if there is good source code, I can convert to Java.

有谁知道我可以在 Java 中使用的强大的 RTF 解析器吗?我需要提取纯文本,包括国际文本。提取嵌入的图像和文件也很好。它也可以是我可以轻松调用的 C++ 或其他库,或者如果有好的源代码,我可以转换为 Java。

The following libraries do not cover enough of the RTF, or fail to parse some valid RTFs

以下库没有涵盖足够的 RTF,或者无法解析一些有效的 RTF

  1. Java Swing's RTFEditorKit, quite basic and brittle Apache Tikka, nutch, and lots of other tools use this.
  2. an RTF library from iText (com.lowagie.etc...), not too comprehensive
  3. etranslate rtf library (this is the most complete of the java ones) Not sure if there is an updated version, but the version I got fails on some of my rtf collection (the RTFs are valid, at least they open in MsWord and OpenOffice OK).
  1. Java Swing 的 RTFEditorKit、非常基本和脆弱的 Apache Tikka、nutch 和许多其他工具都使用它。
  2. 来自 iText (com.lowagie.etc...) 的 RTF 库,不太全面
  3. etranslate rtf 库(这是最完整的 java 库) 不确定是否有更新版本,但我得到的版本在我的一些 rtf 集合上失败(RTF 是有效的,至少它们在 MsWord 和 OpenOffice 中打开 好的)。

There's a C# library that's reasonably complete, but alas ...it's C# and not Java. http://www.codeproject.com/Articles/27431/Writing-Your-Own-RTF-Converter

有一个相当完整的 C# 库,但唉......它是 C# 而不是 Java。 http://www.codeproject.com/Articles/27431/Writing-Your-Own-RTF-Converter

I also looked into OpenOffice, it is too slow for what I need, though it's probably very comprehensive.

我还研究了 OpenOffice,它对于我需要的东西来说太慢了,尽管它可能非常全面。

(I did do web searches and stack overflow searches before posting this question, so if you are referring me to an ancient "already asked" post, it probably doesn't have an answer there. But feel free to point it out, in case I missed it!)

(在发布这个问题之前,我确实进行了网络搜索和堆栈溢出搜索,所以如果你指的是我一个古老的“已经问过”的帖子,它可能没有答案。但请随时指出,以防万一我错过了它!)

回答by Jon Iles

You may find RTF Parser Kituseful. It provides a stream-based parser which delivers events to you as the document is parsed. There is a simple example text extractor provided which demonstrates how the API can be used.

您可能会发现RTF Parser Kit很有用。它提供了一个基于流的解析器,可在解析文档时将事件传递给您。提供了一个简单的示例文本提取器,用于演示如何使用 API。

回答by J-Boss

If your project is non-commercial then there is a good free Java rtf to xml library here, better than etranslate in my opinion, and you can process the xml from there. However if you are using it for commercial purposes you will have to arrange licensing with rtf-to-xml.com, the company that developed it.

如果你的项目是非商业性的,然后有一个很好的免费的Java RTF格式XML库在这里,比在我看来e思达,你可以处理来自那里的XML。但是,如果您将它用于商业目的,则必须与开发它的公司rtf-to-xml.com安排许可。

However having once been in a similar situation, before finding rtf-to-xml, I found a funny work around for this problem when I need to parse ms rtf on linux server. There is a free rich text processor, which is also a library called TedIt takes arguments from the command line with out the user interface and can be wrapped in JNI call.

然而,曾经遇到过类似的情况,在找到 rtf-to-xml 之前,当我需要在 linux 服务器上解析 ms rtf 时,我发现了一个有趣的解决方法来解决这个问题。有一个免费的富文本处理器,它也是一个名为 Ted的库。它从命令行获取参数而不需要用户界面,并且可以包装在 JNI 调用中。

I hope this helps.

我希望这有帮助。