如何在 .NET 中将 HTML 转换为 RTF（富文本）而无需支付组件费用？

Question

提问by Josh Kodroff

Is there a free third-party or .NET class that will convert HTML to RTF (for use in a rich-text enabled Windows Forms control)?

是否有免费的第三方或 .NET 类可以将 HTML 转换为 RTF（用于启用富文本的 Windows 窗体控件）？

The "free" requirement comes from the fact that I'm only working on a prototype and can just load the BrowserControl and just render HTML if need be (even if it is slow) and that Developer Express is going to be releasing their own such control soon-ish.

“免费”要求来自这样一个事实，即我只在原型上工作并且可以只加载 BrowserControl 并在需要时渲染 HTML（即使它很慢）而且 Developer Express 将发布他们自己的这样的控制很快。

I don't want to learn to write RTF by hand, and I already know HTML, so I figure this is the quickest way to get some demonstrable code out the door quickly.

我不想学习手工编写 RTF，而且我已经知道 HTML，所以我认为这是快速获得一些可演示代码的最快方法。

Answer 1

回答by Spartaco

Actually there is a simple and freesolution: use your browser, ok this is the trick I used:

实际上有一个简单且免费的解决方案：使用您的浏览器，这就是我使用的技巧：

var webBrowser = new WebBrowser();
webBrowser.CreateControl(); // only if needed
webBrowser.DocumentText = *yourhtmlstring*;
while (_webBrowser.DocumentText != *yourhtmlstring*)
    Application.DoEvents();
webBrowser.Document.ExecCommand("SelectAll", false, null);
webBrowser.Document.ExecCommand("Copy", false, null);
*yourRichTextControl*.Paste();

This could be slower than other methods but at least it's free and works!

这可能比其他方法慢，但至少它是免费的并且有效！

Answer 2

回答by Jonathan Parker

Check out this CodeProject article on XHTML2RTF.

查看这篇关于XHTML2RTF 的CodeProject 文章。

Answer 3

回答by cjbarth

Expanding on Spartaco's answer I implimented the following which works GREAT!

扩展 Spartaco 的回答，我暗示了以下效果很好！

    Using reportWebBrowser As New WebBrowser
        reportWebBrowser.CreateControl()
        reportWebBrowser.DocumentText = sbHTMLDoc.ToString
        While reportWebBrowser.DocumentText <> sbHTMLDoc.ToString
            Application.DoEvents()
        End While
        reportWebBrowser.Document.ExecCommand("SelectAll", False, Nothing)
        reportWebBrowser.Document.ExecCommand("Copy", False, Nothing)

        Using reportRichTextBox As New RichTextBox
            reportRichTextBox.Paste()
            reportRichTextBox.SaveFile(DocumentFileName)
        End Using
    End Using

Answer 4

回答by Andrew

It is not perfect of course, but here is the code I use to convert HTML to plain text.

当然，它并不完美，但这是我用来将 HTML 转换为纯文本的代码。

(I was not the original author, I adapted it from code found on the web)

（我不是原作者，是根据网上找到的代码改编的）

public static string ConvertHtmlToText(string source) {

            string result;

            // Remove HTML Development formatting
            // Replace line breaks with space
            // because browsers inserts space
            result = source.Replace("\r", " ");
            // Replace line breaks with space
            // because browsers inserts space
            result = result.Replace("\n", " ");
            // Remove step-formatting
            result = result.Replace("\t", string.Empty);
            // Remove repeating speces becuase browsers ignore them
            result = System.Text.RegularExpressions.Regex.Replace(result,
                                                                  @"( )+", " ");

            // Remove the header (prepare first by clearing attributes)
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"<( )*head([^>])*>", "<head>",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"(<( )*(/)( )*head( )*>)", "</head>",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     "(<head>).*(</head>)", string.Empty,
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);

            // remove all scripts (prepare first by clearing attributes)
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"<( )*script([^>])*>", "<script>",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"(<( )*(/)( )*script( )*>)", "</script>",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            //result = System.Text.RegularExpressions.Regex.Replace(result, 
            //         @"(<script>)([^(<script>\.</script>)])*(</script>)",
            //         string.Empty, 
            //         System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"(<script>).*(</script>)", string.Empty,
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);

            // remove all styles (prepare first by clearing attributes)
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"<( )*style([^>])*>", "<style>",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"(<( )*(/)( )*style( )*>)", "</style>",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     "(<style>).*(</style>)", string.Empty,
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);

            // insert tabs in spaces of <td> tags
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"<( )*td([^>])*>", "\t",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);

            // insert line breaks in places of <BR> and <LI> tags
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"<( )*br( )*>", "\r",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"<( )*li( )*>", "\r",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);

            // insert line paragraphs (double line breaks) in place
            // if <P>, <DIV> and <TR> tags
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"<( )*div([^>])*>", "\r\r",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"<( )*tr([^>])*>", "\r\r",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"<( )*p([^>])*>", "\r\r",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);

            // Remove remaining tags like <a>, links, images,
            // comments etc - anything thats enclosed inside < >
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"<[^>]*>", string.Empty,
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);

            // replace special characters:
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"&nbsp;", " ",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);

            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"&bull;", " * ",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"&lsaquo;", "<",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"&rsaquo;", ">",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"&trade;", "(tm)",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"&frasl;", "/",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"<", "<",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @">", ">",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"&copy;", "(c)",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"&reg;", "(r)",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            // Remove all others. More can be added, see
            // http://hotwired.lycos.com/webmonkey/reference/special_characters/
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     @"&(.{2,6});", string.Empty,
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);


            // make line breaking consistent
            result = result.Replace("\n", "\r");

            // Remove extra line breaks and tabs:
            // replace over 2 breaks with 2 and over 4 tabs with 4. 
            // Prepare first to remove any whitespaces inbetween
            // the escaped characters and remove redundant tabs inbetween linebreaks
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     "(\r)( )+(\r)", "\r\r",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     "(\t)( )+(\t)", "\t\t",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     "(\t)( )+(\r)", "\t\r",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     "(\r)( )+(\t)", "\r\t",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            // Remove redundant tabs
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     "(\r)(\t)+(\r)", "\r\r",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            // Remove multible tabs followind a linebreak with just one tab
            result = System.Text.RegularExpressions.Regex.Replace(result,
                     "(\r)(\t)+", "\r\t",
                     System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            // Initial replacement target string for linebreaks
            string breaks = "\r\r\r";
            // Initial replacement target string for tabs
            string tabs = "\t\t\t\t\t";
            for (int index = 0; index < result.Length; index++) {
                result = result.Replace(breaks, "\r\r");
                result = result.Replace(tabs, "\t\t\t\t");
                breaks = breaks + "\r";
                tabs = tabs + "\t";
            }

            // Thats it.
            return result;

    }

Answer 5

回答by NtFreX

TL;DR:I recommend using the OpenXmlformat and the HtmlToOpenXmlnuget package if possible.

TL;DR：如果可能，我建议使用OpenXml格式和HtmlToOpenXmlnuget 包。

Microsoft Word COM

微软Word COM

I haven't really searched much into this topic as a my use case is to use the functionality on a server which makes COM components not a great selection.

我并没有真正深入研究这个主题，因为我的用例是在服务器上使用该功能，这使得 COM 组件不是一个很好的选择。

XHTML2RTF

As @JonathanParker mentioned you can use this codeproject library.

正如@JonathanParker 提到的，您可以使用这个 codeproject 库。

Disadvantages are:

缺点是：

Limited supported HTML and CSS
Not really .NET
...

有限支持的 HTML 和 CSS
不是真的.NET
...

Windows Forms Web Browser

Windows 窗体 Web 浏览器

As @Spartaco mentioned you can use the Windows Forms WebBrowsercontrol.

正如@Spartaco 提到的，您可以使用 Windows 窗体WebBrowser控件。

Disadvantages are:

缺点是：

Reference to System.Windows.Forms
Uses copy & paste (problematic for multithreading)
Only works in an STA thread

参考 System.Windows.Forms
使用复制和粘贴（多线程有问题）
仅适用于 STA 线程

Not supported features include:

不支持的功能包括：

Fonts
Colors
Numbered lists
Strikethrough (delelement)
...

字体
颜色
编号列表
删除线（del元素）
...

DevExpress

开发速递

Code sample of "Paul V" from the devexpress support center. (03.02.2015)

来自devexpress 支持中心的“Paul V”代码示例。(03.02.2015)

public String ConvertRTFToHTML(String RTF)
{   
    MemoryStream ms = new MemoryStream();
    StreamWriter writer = new StreamWriter(ms);
    writer.Write(RTF);
    writer.Flush();
    ms.Position = 0;
    String output = "";
    HtmlEditorExtension.Import(HtmlEditorImportFormat.Rtf, ms, (s, enumerable) => output = s);

    return output;
}

public String ConvertHTMLToRTF(String Html)
{
    MemoryStream ms = new MemoryStream();
    var editor = new ASPxHtmlEditor { Html = html };

    editor.Export(HtmlEditorExportFormat.Rtf, ms);

    ms.Position = 0;
    StreamReader reader = new StreamReader(ms);

    return reader.ReadToEnd();
}

Or you could use the RichEditDocumentServertype as shown in this example.

或者您可以使用本示例中RichEditDocumentServer所示的类型。

A license for devexpresscan coast from around 1500.- USD to 2200.- USD.

devexpress的许可证可以从大约 1500.- 美元到 2200.- 美元。

Unknown what actually is supported.

未知实际支持什么。

Disadvantages are:

缺点是：

Price
Quite a lot of references for one small thing
More?

价钱
一件小事的参考资料相当多
更多的？

Not supported features include:

不支持的功能包括：

Striketrough (delelement)

罢工（del元素）

Sautinsoft

软体

public string ConvertHTMLToRTF(string html)
{
    SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();
    return h.ConvertString(htmlString);
}

public string ConvertRTFToHTML(string rtf)
{
    SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
    byte[] bytes = Encoding.ASCII.GetBytes(rtf);
    r.OpenDocx(bytes );
    return r.ToHtml();
}

More examples and configuration options can be found hereand here.

可以在此处和此处找到更多示例和配置选项。

A licence for this componentcan coast from 400.- USD to 2000.- USD.

此组件的许可证可以从 400.- 美元到 2000.- 美元不等。

Supported is the following:

支持以下内容：

HTML 3.2
HTML 4.01
HTML 5
CSS
XHTML

HTML 3.2
HTML 4.01
HTML 5
CSS
XHTML

Disadvantages are:

缺点是：

I'm not sure how active the development is
Price

我不确定开发的活跃程度
价钱

Usage knowledgebase:

使用知识库：

Converting numbered lists from the trix angular editordestroys indend

从trix 角度编辑器转换编号列表会破坏 indend

DIY

If you only wanted to support limited functionality you could write your own converter. I would not recommend this if the supported feature set is too large.

如果您只想支持有限的功能，您可以编写自己的转换器。如果支持的功能集太大，我不建议这样做。

I have a small sample project herebut is only for educational purposes in its current state.

我在这里有一个小样本项目，但仅用于当前状态的教育目的。

OpenXml

打开XML

If the OpenXml formatis also ok for your use case you can use the HtmlToOpenXml nuget package. Its free and did support all features I've tested the other solutions against.

如果OpenXml 格式也适合您的用例，您可以使用HtmlToOpenXml nuget 包。它是免费的，并且确实支持我测试过其他解决方案的所有功能。

The projectis based on the Open Xml SDKby microsoft and seems active.

该项目基于microsoft的Open Xml SDK，看起来很活跃。

public static byte[] ConvertHtmlToOpenXml(string html)
{
    using (var generatedDocument = new MemoryStream())
    {
        using (var package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
        {
            var mainPart = package.MainDocumentPart;
            if (mainPart == null)
            {
                mainPart = package.AddMainDocumentPart();
                new Document(new Body()).Save(mainPart);
            }

            var converter = new HtmlConverter(mainPart);
            converter.ParseHtml(html);

            mainPart.Document.Save();
        }

        return generatedDocument.ToArray();
    }
}

Link to example gist

链接到示例要点

Answer 6

回答by GvS

Maybe what you need is a control to edit the HTML?

也许您需要的是一个编辑 HTML 的控件？

Answer 7

回答by Jacek Krawczyk

I recommend a console tool named Pandoc. It is not a component, it is rather huge conversion pack. I am using it to convert between HTML and LaTeX. It is just awesome.

我推荐一个名为Pandoc的控制台工具。它不是一个组件，它是一个相当大的转换包。我正在使用它在 HTML 和 LaTeX 之间进行转换。这真是太棒了。

The full list of supported formats you can find on the program page.

您可以在程序页面上找到支持格式的完整列表。

In order to convert an HTML document to RTF format you write on the console:

为了将 HTML 文档转换为 RTF 格式，您可以在控制台上编写：

pandoc filename.html -f html -t rtf -s -o filename.rtf

如何在 .NET 中将 HTML 转换为 RTF（富文本）而无需支付组件费用？

提问by Josh Kodroff

回答by Spartaco

回答by Jonathan Parker

回答by cjbarth

回答by Andrew

回答by NtFreX

回答by GvS

回答by Jacek Krawczyk

相关推荐

最近更新

标签

如何在 .NET 中将 HTML 转换为 RTF（富文本）而无需支付组件费用？

提问by Josh Kodroff

回答by Spartaco

回答by Jonathan Parker

回答by cjbarth

回答by Andrew

回答by NtFreX

回答by GvS

回答by Jacek Krawczyk

相关推荐

Html 显示只允许数字和小数点的输入？

Html 使用溢出时无法隐藏滚动条：自动

如何在 HTML 页面中嵌入 SWF 文件？

Html 表格内单个 TD 的 CSS 填充

相关推荐

最近更新

标签