.net ITextSharp HTML 到 PDF？

Question

提问by Kyle

I'd like to know if ITextSharp has the capability of converting HTML to PDF. Everything I will convert will just be plain text but unfortunately there is very little to no documentation on ITextSharp so I can't determine if that will be a viable solution for me.

我想知道 ITextSharp 是否具有将 HTML 转换为 PDF 的能力。我将转换的所有内容都只是纯文本，但不幸的是，ITextSharp 上几乎没有文档，因此我无法确定这对我来说是否可行。

If it can't do it, can someone point me to some good, free .net libraries that can take a simple plain text HTML document and convert it to a pdf?

如果它不能这样做，有人可以指点我一些好的、免费的 .net 库，这些库可以采用简单的纯文本 HTML 文档并将其转换为 pdf 吗？

tia.

蒂亚。

Answer 1

采纳答案by Kyle

after doing some digging I found a good way to accomplish what I need with ITextSharp.

在做了一些挖掘之后，我找到了一种用 ITextSharp 完成我需要的好方法。

Here is some sample code if it will help anyone else in the future:

如果将来可以帮助其他人，这里有一些示例代码：

protected void Page_Load(object sender, EventArgs e)
{
    Document document = new Document();
    try
    {
        PdfWriter.GetInstance(document, new FileStream("c:\my.pdf", FileMode.Create));
        document.Open();
        WebClient wc = new WebClient();
        string htmlText = wc.DownloadString("http://localhost:59500/my.html");
        Response.Write(htmlText);
        List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null);
        for (int k = 0; k < htmlarraylist.Count; k++)
        {
            document.Add((IElement)htmlarraylist[k]);
        }

        document.Close();
    }
    catch
    {
    }
}

Answer 2

回答by Jonathan

I came across the same question a few weeks ago and this is the result from what I found. This method does a quick dump of HTML to a PDF. The document will most likely need some format tweaking.

几周前我遇到了同样的问题，这是我发现的结果。此方法将 HTML 快速转储为 PDF。该文档很可能需要一些格式调整。

private MemoryStream createPDF(string html)
{
    MemoryStream msOutput = new MemoryStream();
    TextReader reader = new StringReader(html);

    // step 1: creation of a document-object
    Document document = new Document(PageSize.A4, 30, 30, 30, 30);

    // step 2:
    // we create a writer that listens to the document
    // and directs a XML-stream to a file
    PdfWriter writer = PdfWriter.GetInstance(document, msOutput);

    // step 3: we create a worker parse the document
    HTMLWorker worker = new HTMLWorker(document);

    // step 4: we open document and start the worker on the document
    document.Open();
    worker.StartDocument();

    // step 5: parse the html into the document
    worker.Parse(reader);

    // step 6: close the document and the worker
    worker.EndDocument();
    worker.Close();
    document.Close();

    return msOutput;
}

Answer 3

回答by μBio

Here's what I was able to get working on version 5.4.2 (from the nuget install) to return a pdf response from an asp.net mvc controller. It could be modfied to use a FileStream instead of MemoryStream for the output if that's what is needed.

这是我能够在 5.4.2 版（来自 nuget 安装）上工作以从 asp.net mvc 控制器返回 pdf 响应的内容。如果需要，可以修改为使用 FileStream 而不是 MemoryStream 作为输出。

I post it here because it is a complete example of current iTextSharp usage for the html -> pdf conversion (disregarding images, I haven't looked at that since my usage doesn't require it)

我把它贴在这里是因为它是当前 iTextSharp 使用 html -> pdf 转换的完整示例（不考虑图像，我没有看过，因为我的使用不需要它）

It uses iTextSharp's XmlWorkerHelper, so the incoming hmtl must be valid XHTML, so you may need to do some fixup depending on your input.

它使用 iTextSharp 的 XmlWorkerHelper，因此传入的 hmtl 必须是有效的 XHTML，因此您可能需要根据您的输入进行一些修正。

using iTextSharp.text.pdf;
using iTextSharp.tool.xml;
using System.IO;
using System.Web.Mvc;

namespace Sample.Web.Controllers
{
    public class PdfConverterController : Controller
    {
        [ValidateInput(false)]
        [HttpPost]
        public ActionResult HtmlToPdf(string html)
        {           

            html = @"<?xml version=""1.0"" encoding=""UTF-8""?>
                 <!DOCTYPE html 
                     PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN""
                    ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">
                 <html xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""en"" lang=""en"">
                    <head>
                        <title>Minimal XHTML 1.0 Document with W3C DTD</title>
                    </head>
                  <body>
                    " + html + "</body></html>";

            var bytes = System.Text.Encoding.UTF8.GetBytes(html);

            using (var input = new MemoryStream(bytes))
            {
                var output = new MemoryStream(); // this MemoryStream is closed by FileStreamResult

                var document = new iTextSharp.text.Document(iTextSharp.text.PageSize.LETTER, 50, 50, 50, 50);
                var writer = PdfWriter.GetInstance(document, output);
                writer.CloseStream = false;
                document.Open();

                var xmlWorker = XMLWorkerHelper.GetInstance();
                xmlWorker.ParseXHtml(writer, document, input, null);
                document.Close();
                output.Position = 0;

                return new FileStreamResult(output, "application/pdf");
            }
        }
    }
}

Answer 4

回答by Carl Steffen

I would one-up'd mightymada's answer if I had the reputation - I just implemented an asp.net HTML to PDF solution using Pechkin. results are wonderful.

如果我有声望，我会选择一个强大的答案 - 我刚刚使用 Pechkin 实现了一个 asp.net HTML to PDF 解决方案。结果很棒。

There is a nuget package for Pechkin, but as the above poster mentions in his blog (http://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/- I hope she doesn't mind me reposting it), there's a memory leak that's been fixed in this branch:

Pechkin 有一个 nuget 包，但正如上面的海报在他的博客中提到的（http://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/- 我希望她不介意我重新发布它），此分支中已修复内存泄漏：

https://github.com/tuespetre/Pechkin

The above blog has specific instructions for how to include this package (it's a 32 bit dll and requires .net4). here is my code. The incoming HTML is actually assembled via HTML Agility pack (I'm automating invoice generations):

上面的博客有关于如何包含这个包的具体说明（它是一个 32 位的 dll，需要 .net4）。这是我的代码。传入的 HTML 实际上是通过 HTML Agility 包组装的（我正在自动生成发票）：

public static byte[] PechkinPdf(string html)
{
  //Transform the HTML into PDF
  var pechkin = Factory.Create(new GlobalConfig());
  var pdf = pechkin.Convert(new ObjectConfig()
                          .SetLoadImages(true).SetZoomFactor(1.5)
                          .SetPrintBackground(true)
                          .SetScreenMediaType(true)
                          .SetCreateExternalLinks(true), html);

  //Return the PDF file
  return pdf;
}

again, thank you mightymada - your answer is fantastic.

再次感谢mightymada - 你的回答太棒了。

Answer 5

回答by mightymada

I prefer using another library called Pechkin because it is able to convert non trivial HTML (that also has CSS classes). This is possible because this library uses the WebKit layout engine that is also used by browsers like Chrome and Safari.

我更喜欢使用另一个名为 Pechkin 的库，因为它能够转换非平凡的 HTML（也有 CSS 类）。这是可能的，因为该库使用了 Chrome 和 Safari 等浏览器也使用的 WebKit 布局引擎。

I detailed on my blog my experience with Pechkin: http://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/

我在我的博客上详细介绍了我对 Pechkin 的体验：http://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/

Answer 6

回答by Nitin Singh

It has ability to convert HTML file in to pdf.

它具有将 HTML 文件转换为 pdf 的能力。

Required namespace for conversions are:

转换所需的命名空间是：

using iTextSharp.text;
using iTextSharp.text.pdf;

and for conversion and download file :

以及转换和下载文件：

// Create a byte array that will eventually hold our final PDF
Byte[] bytes;

// Boilerplate iTextSharp setup here

// Create a stream that we can write to, in this case a MemoryStream
using (var ms = new MemoryStream())
{
    // Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
    using (var doc = new Document())
    {
        // Create a writer that's bound to our PDF abstraction and our stream
        using (var writer = PdfWriter.GetInstance(doc, ms))
        {
            // Open the document for writing
            doc.Open();

            string finalHtml = string.Empty;

            // Read your html by database or file here and store it into finalHtml e.g. a string
            // XMLWorker also reads from a TextReader and not directly from a string
            using (var srHtml = new StringReader(finalHtml))
            {
                // Parse the HTML
                iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
            }

            doc.Close();
        }
    }

    // After all of the PDF "stuff" above is done and closed but **before** we
    // close the MemoryStream, grab all of the active bytes from the stream
    bytes = ms.ToArray();
}

// Clear the response
Response.Clear();
MemoryStream mstream = new MemoryStream(bytes);

// Define response content type
Response.ContentType = "application/pdf";

// Give the name of file of pdf and add in to header
Response.AddHeader("content-disposition", "attachment;filename=invoice.pdf");
Response.Buffer = true;
mstream.WriteTo(Response.OutputStream);
Response.End();

Answer 7

回答by Soan

The above code will certainly help in converting HTML to PDF but will fail if the the HTML code has IMG tags with relative paths. iTextSharp library does not automatically convert relative paths to absolute ones.

上面的代码肯定有助于将 HTML 转换为 PDF，但如果 HTML 代码具有带有相对路径的 IMG 标签，则会失败。iTextSharp 库不会自动将相对路径转换为绝对路径。

I tried the above code and added code to take care of IMG tags too.

我尝试了上面的代码并添加了代码来处理 IMG 标签。

You can find the code here for your reference: http://www.am22tech.com/html-to-pdf/

您可以在此处找到代码以供参考：http: //www.am22tech.com/html-to-pdf/

Answer 8

回答by meJustAndrew

If you are converting html to pdf on the html server side you can use Rotativa :

如果您在 html 服务器端将 html 转换为 pdf，您可以使用 Rotativa ：

Install-Package Rotativa

This is based on wkhtmltopdf but it has better css support than iTextSharp has and is very simple to integrate with MVC (which is mostly used) as you can simply return the view as pdf:

这是基于 wkhtmltopdf 但它具有比 iTextSharp 更好的 css 支持，并且与 MVC（主要使用）集成非常简单，因为您可以简单地将视图返回为 pdf：

public ActionResult GetPdf()
{
    //...
    return new ViewAsPdf(model);// and you are done!
}

.net ITextSharp HTML 到 PDF？

提问by Kyle

采纳答案by Kyle

回答by Jonathan

回答by μBio

回答by Carl Steffen

回答by mightymada

回答by Nitin Singh

回答by Soan

回答by meJustAndrew

相关推荐

最近更新

标签

.net ITextSharp HTML 到 PDF？

提问by Kyle

采纳答案by Kyle

回答by Jonathan

回答by μBio

回答by Carl Steffen

回答by mightymada

回答by Nitin Singh

回答by Soan

回答by meJustAndrew

相关推荐

用于 .NET 4.0 的 Svcutil.exe？

.net System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse 请求失败，HTTP 状态为 404

Microsoft 何时会结束对 .NET Framework 现有版本的主流支持？

如何使用 .NET Framework 获取 DomainName\AccountName？

相关推荐

最近更新

标签