C# 将 Word 转换为 HTML,然后在网页上呈现 HTML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18256812/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 11:42:15  来源:igfitidea点击:

Convert Word to HTML then render HTML on webpage

c#ms-word

提问by James Wilson

I have a tough project in my pipeline and I'm not sure where to begin. My boss wants the ability to display a Word Document in HTML and it look the same as the word document.

我的管道中有一个艰巨的项目,我不知道从哪里开始。我的老板希望能够以 HTML 格式显示 Word 文档,并且它看起来与 Word 文档相同。

After trying time after time to just let me show the word document in a pop up or a light box he is stuck on stripping out the contents of the word converting it to HTML saving that in a database and then displaying it as HTML on a webpage.

在一次又一次地尝试让我在弹出窗口或灯箱中显示 word 文档后,他坚持要剥离单词的内容,将其转换为 HTML,将其保存在数据库中,然后在网页上将其显示为 HTML .

Can you guys either give me some good ammo as to if showing the word document is better (less cumbersome, less storage space more secure etc).

你们能不能给我一些好的弹药,看看显示 Word 文档是否更好(不那么麻烦,更少的存储空间更安全等)。

Or if it's pretty easy to convert a word document to HTML ways for me to do that.

或者,如果将 word 文档转换为 HTML 的方式对我来说非常容易。

The technologies I current have are Entity Framework, LINQ, MVC, C#, Razor.

我目前拥有的技术是实体框架、LINQ、MVC、C#、Razor。

We currently use HTmlAgilityPack, but this strips out all of the formatting and doesn't allow the document to show very well.

我们目前使用 HTmlAgilityPack,但这会去除所有格式,并且无法很好地显示文档。

采纳答案by Dave Bish

We use http://www.aspose.com/(I think the one we use is Aspose words) to perform s similar task, and it works quite well. (there is a cost involved)

我们使用http://www.aspose.com/(我认为我们使用的是Aspose word)来执行类似的任务,并且效果很好。(有成本)

I would suggest that converting to HTML gives the worst rendition of the document. One solution we use, is to generate a Jpeg image of the document and display that.

我建议转换为 HTML 会给出最糟糕的文档再现。我们使用的一种解决方案是生成文档的 Jpeg 图像并显示它。

If you need to be able to perform operations like find and copy/pasting text - I would recommend converting the document to a .pdf, and displaying it inline, in whichever standard pdf viewer the client machine has installed.

如果您需要能够执行查找和复制/粘贴文本等操作 - 我建议将文档转换为 .pdf,并在客户端计算机安装的任何标准 pdf 查看器中内嵌显示它。

回答by Daniel Szabo

If your boss is dead-set on displaying it in HTML, then getting the HTML generated by the word doc into your database is the hardest part of the project.

如果您的老板一心想用 HTML 显示它,那么将单词 doc 生成的 HTML 放入您的数据库是项目中最难的部分。

You have a couple of workflows to choose from, but they go something like this:

您有几个工作流可供选择,但它们是这样的:

  1. User saves to .Doc to .HTML >> user uploads doc to database thru app you create >> web app pulls the HTML from the database to display on web page

  2. User saves .Doc >> user uploads doc thru app you create >> the app converts the doc on the fly and then inserts HTML into database >> web app pulls the HTML from the database to display on the web page

  3. User saves and uploads .Doc file to database >> web app pulls the doc and converts it on the fly when its requested by a web page

  4. etc etc etc

  1. 用户保存到 .Doc 到 .HTML >> 用户通过您创建的应用程序将文档上传到数据库 >> Web 应用程序从数据库中提取 HTML 以显示在网页上

  2. 用户保存 .Doc >> 用户通过您创建的应用程序上传文档 >> 该应用程序即时转换文档,然后将 HTML 插入数据库 >> Web 应用程序从数据库中提取 HTML 以显示在网页上

  3. 用户将 .Doc 文件保存并上传到数据库 >> Web 应用程序拉取文档并在网页请求时即时转换它

  4. 等等等等

Unfortunately, you're in for a bit of tomfoolery no matter which workflow you choose. @DaveBish suggested using a 3rd party tool, which I completely agree with as being the best way to handle the conversion (if you don't require your users to save their docs to HTML). Also, be aware that images in Word documents can be problematic when you've converted to HTML (they aren't preserved in the generated file, which means more /sarcasm/ fun for you on the web dev side).

不幸的是,无论您选择哪种工作流程,您都会陷入一些愚蠢的境地。@DaveBish 建议使用第 3 方工具,我完全同意这是处理转换的最佳方式(如果您不要求用户将他们的文档保存为 HTML)。另外,请注意,当您转换为 HTML 时,Word 文档中的图像可能会出现问题(它们不会保留在生成的文件中,这意味着在 Web 开发方面对您来说有更多的 /sarcasm/ 乐趣)。

If your boss doesn't want to foot the bill for a 3rd party converter, you can attempt to handle the conversion on your own with the Office.Interop namespace [insert blah about how this is a terrible idea blah blah]...in which case, this answerwill probably be of great use to you.

如果您的老板不想为 3rd 方转换器买单,您可以尝试使用 Office.Interop 命名空间自行处理转换 [插入 blah 关于这是一个多么糟糕的想法 blah blah]...in在这种情况下,这个答案可能对你很有用。

回答by Gonzix

If you are using DOCX you can allways use Open XML SDK from Microsoft, it's pretty easy to use and clean. A sample taken from MSDN

如果您使用 DOCX,您可以随时使用 Microsoft 的 Open XML SDK,它非常易于使用和清洁。来自 MSDN 的示例

// This example shows the simplest conversion. No images are converted.
// A cascading style sheet is not used.
byte[] byteArray = File.ReadAllBytes("Test.docx");
using (MemoryStream memoryStream = new MemoryStream())
{
    memoryStream.Write(byteArray, 0, byteArray.Length);
    using (WordprocessingDocument doc =         WordprocessingDocument.Open(memoryStream, true))
    {
        HtmlConverterSettings settings = new HtmlConverterSettings()
        {
            PageTitle = "My Page Title"
        };
        XElement html = HtmlConverter.ConvertToHtml(doc, settings);

        // Note: the XHTML returned by ConvertToHtmlTransform contains objects of type
        // XEntity. PtOpenXmlUtil.cs defines the XEntity class. See
        // http://blogs.msdn.com/ericwhite/archive/2010/01/21/writing-entity-references-using-linq-to-xml.aspx
        // for detailed explanation.
        //
        // If you further transform the XML tree returned by ConvertToHtmlTransform, you
        // must do it correctly, or entities do not serialize properly.

        File.WriteAllText("Test.html", html.ToStringNewLineOnAttributes());
    }
}

You might also want to take a look to the Word automation services http://blogs.office.com/b/microsoft-word/archive/2009/12/16/word-automation-services_3a00_-what-it-does.aspx

您可能还想查看 Word 自动化服务http://blogs.office.com/b/microsoft-word/archive/2009/12/16/word-automation-services_3a00_-what-it-does.aspx

回答by Ravi Gaurav Pandey

You can also go through Free Spire.Docfor more support

您还可以通过Free Spire.Doc获得更多支持

回答by hertzogth

I've used GemBox.Document, it can embed the images from Word document within the HTML file itself.
For example, like this:

我使用过GemBox.Document,它可以将 Word 文档中的图像嵌入 HTML 文件本身。
例如,像这样:

MemoryStream docxStream = null; // Your DOCX file's path or stream.
DocxLoadOptions docxOptions = new DocxLoadOptions();

// Load DOCX file.
DocumentModel document = DocumentModel.Load(docxStream, docxOptions);

MemoryStream htmlStream = new MemoryStream();
HtmlSaveOptions htmlOptions = new HtmlSaveOptions();
htmlOptions.EmbedImages = true;
htmlOptions.HtmlType = HtmlType.HtmlInline;

// Save HTML file.
document.Save(htmlStream, htmlOptions);

Also, by using HtmlType.HtmlInlineI get a HTML content that can be placed on an existing page (like in a viewer or WYSIWYG editor). Check out the rest of the HtmlSaveOptionsproperties.

此外,通过使用HtmlType.HtmlInline我获得了可以放置在现有页面上的 HTML 内容(例如在查看器或 WYSIWYG 编辑器中)。查看其余的HtmlSaveOptions属性。

You can find more examples of this approach on Convert between Word and HTMLand Word Editor in ASP.NET MVC.

您可以在 ASP.NET MVC 中的Word 和 HTML 之间转换Word 编辑器中找到更多这种方法的示例。