C# 如何在不知道字符串中包含哪些标签的情况下从字符串中删除所有 HTML 标签?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18153998/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I remove all HTML tags from a string without knowing which tags are in it?
提问by RJ.
Is there any easy way to remove all HTML tags or ANYTHING HTML related from a string?
有没有什么简单的方法可以从字符串中删除所有 HTML 标签或任何与 HTML 相关的内容?
For example:
例如:
string title = "<b> Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series, )"
The above should really be:
以上应该是:
"Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] (Reality Series)"
“绿巨人霍根的名人冠军摔跤 [Proj # 206010](真人秀系列)”
采纳答案by Bidou
You can use a simple regex like this:
您可以使用这样的简单正则表达式:
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
Be aware that this solution has its own flaw.See Remove HTML tags in Stringfor more information (especially the comments of @mehaase)
请注意,此解决方案有其自身的缺陷。有关更多信息(尤其是@mehaase 的评论),请参阅删除字符串中的 HTML 标签
Another solution would be to use the HTML Agility Pack.
You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?
另一种解决方案是使用HTML Agility Pack。
您可以在此处找到使用该库的示例:HTML 敏捷包 - 在不删除内容的情况下删除不需要的标签?
回答by ssilas777
You can parse the string using Html Agility packand get the InnerText.
您可以使用Html Agility pack解析字符串并获取 InnerText。
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(@"<b> Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series, )");
string result = htmlDoc.DocumentNode.InnerText;
回答by Vinay
You can use the below code on your string and you will get the complete string without html part.
您可以在字符串上使用以下代码,您将获得没有 html 部分的完整字符串。
string title = "<b> Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series, )".Replace(" ",string.Empty);
string s = Regex.Replace(title, "<.*?>", String.Empty);