如何在 C# 中去除 html 标签

Question

提问by Mr.CSharp

Possible Duplicate:
How to clean HTML tags using C#

可能的重复：
如何使用 C# 清理 HTML 标签

What is the best way to strip HTML tags in C#?

在 C# 中剥离 HTML 标签的最佳方法是什么？

Answer 1

回答by George Stocker

To guarantee that no HTML tags get through, use: HttpServerUtility.HtmlEncode(string);.

为保证没有 HTML 标签通过，请使用：HttpServerUtility.HtmlEncode(string);。

If you want someto get through, you can use this "Whitelist" approach.

如果你想要一些通过，你可以使用这种“白名单”方法。

Update: There has been some vulnerabilities found in that code; as a Developer from Fog Creek tells us.

更新：在该代码中发现了一些漏洞；正如Fog Creek 的一位开发人员告诉我们的那样。

(Second link includes code).

（第二个链接包含代码）。

Answer 2

回答by Ivan G.

  public static string StripHTML(string htmlString)
  {

     string pattern = @"<(.|\n)*?>";

     return Regex.Replace(htmlString, pattern, string.Empty);
  }

Answer 3

回答by Lachlan Roche

Take your HTML string or document and parse it with HTML Agility Pack. This will give you a HTMLDocument object that is very similar to a XmlDocument.

获取您的 HTML 字符串或文档并使用HTML Agility Pack对其进行解析。这将为您提供一个与 XmlDocument 非常相似的 HTMLDocument 对象。

You can then use it's methods such as SelectNodesto access those portions of the document that you are interested in.

然后，您可以使用它的方法SelectNodes来访问您感兴趣的文档部分。

If you choose to use another approach, be aware that parsing HTML (a non-Regular language) with Regular Expressions is widely regarded as a bad idea.

如果您选择使用另一种方法，请注意使用正则表达式解析 HTML（一种非正则语言）被广泛认为是一个坏主意。

And regardless of the approach, if you are keeping some markup, use a whitelist approach. This means to remove everything that is not explicitly wanted.

无论采用哪种方法，如果您要保留一些标记，请使用白名单方法。这意味着删除所有没有明确需要的东西。

如何在 C# 中去除 html 标签

提问by Mr.CSharp

回答by George Stocker

回答by Ivan G.

回答by Lachlan Roche

相关推荐

最近更新

标签

如何在 C# 中去除 html 标签

提问by Mr.CSharp

回答by George Stocker

回答by Ivan G.

回答by Lachlan Roche

相关推荐

Linux 将文件重命名为大写

Linux 如何运行 nohup 并在单个 bash 语句中写入其 pid 文件

C# Xml 序列化 - 呈现空元素

Linux 更改 ssl 证书的代理背后的 Docker

相关推荐

最近更新

标签