vb.net 去除字符串中的html标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17665582/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Stripping out html tags in string
提问by y--
I have a program I'm writing that is supposedto strip html tags out of a string. I've been trying to replace all strings that start with "<" and end with ">". This (obviously because I'm here asking this) has not worked so far. Here's what I've tried:
我有一个我正在编写的程序,它应该从字符串中去除 html 标签。我一直在尝试替换所有以“<”开头并以“>”结尾的字符串。这(显然是因为我在这里问这个)到目前为止还没有奏效。这是我尝试过的:
StrippedContent = Regex.Replace(StrippedContent, "\<.*\>", "")
That just returns what seems like a random part of the original string. I've also tried
这只是返回原始字符串的随机部分。我也试过
For Each StringMatch As Match In Regex.Matches(StrippedContent, "\<.*\>")
StrippedContent = StrippedContent.Replace(StringMatch.Value, "")
Next
Which did the same thing (returns what seems like a random part of the original string). Is there a better way to do this? By better I mean a way that works.
它做了同样的事情(返回看起来像是原始字符串的随机部分)。有一个更好的方法吗?我所说的更好是指一种有效的方式。
回答by Ro Yo Mi
Description
描述
This expression will:
这个表达式将:
- find and replace all tags with nothing
- avoid problematic edge cases
- 查找并替换所有标签
- 避免有问题的边缘情况
Regex: <(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>
正则表达式: <(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>
Replace with: nothing
替换为:无


Example
例子
Sample Text
示例文本
Note the difficult edge case in the mouse over function
请注意鼠标悬停功能中的困难边缘情况
these are <a onmouseover=' href="NotYourHref" ; if (6/a>3) { funRotator(href) } ; ' href=abc.aspx?filter=3&prefix=&num=11&suffix=>the droids</a> you are looking for.
these are <a onmouseover=' href="NotYourHref" ; if (6/a>3) { funRotator(href) } ; ' href=abc.aspx?filter=3&prefix=&num=11&suffix=>the droids</a> you are looking for.
Code
代码
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim sourcestring as String = "replace with your source string"
Dim replacementstring as String = ""
Dim matchpattern as String = "<(?:[^>=]|='[^']*'|=""[^""]*""|=[^'""][^\s>]*)*>"
Console.Writeline(regex.Replace(sourcestring,matchpattern,replacementstring,RegexOptions.IgnoreCase OR RegexOptions.IgnorePatternWhitespace OR RegexOptions.Multiline OR RegexOptions.Singleline))
End Sub
End Module
String after replacement
替换后的字符串
these are the droids you are looking for.
回答by y--
Well, this proves that you should always search Google for an answer. Here's a method I got from http://www.dotnetperls.com/remove-html-tags-vbnet
好吧,这证明您应该始终在 Google 上搜索答案。这是我从http://www.dotnetperls.com/remove-html-tags-vbnet获得的方法
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim html As String = "<p>There was a <b>.NET</b> programmer " +
"and he stripped the <i>HTML</i> tags.</p>"
Dim tagless As String = StripTags(html)
Console.WriteLine(tagless)
End Sub
Function StripTags(ByVal html As String) As String
Return Regex.Replace(html, "<.*?>", "")
End Function
End Module
回答by mikro
Here's a simple function using the regex pattern that Ro Yo Mi posted.
这是一个使用 Ro Yo Mi 发布的正则表达式模式的简单函数。
<Extension()> Public Function RemoveHtmlTags(value As String) As String
Return Regex.Replace(value, "<(?:[^>=]|='[^']*'|=""[^""]*""|=[^'""][^\s>]*)*>", "")
End Function
Demonstration:
示范:
Dim html As String = "This <i>is</i> just a <b>demo</b>.".RemoveHtmlTags()
Console.WriteLine(html)

