vb.net 去除字符串中的html标签

Question

提问by y--

I have a program I'm writing that is supposedto strip html tags out of a string. I've been trying to replace all strings that start with "<" and end with ">". This (obviously because I'm here asking this) has not worked so far. Here's what I've tried:

我有一个我正在编写的程序，它应该从字符串中去除 html 标签。我一直在尝试替换所有以“<”开头并以“>”结尾的字符串。这（显然是因为我在这里问这个）到目前为止还没有奏效。这是我尝试过的：

StrippedContent = Regex.Replace(StrippedContent, "\<.*\>", "")

That just returns what seems like a random part of the original string. I've also tried

这只是返回原始字符串的随机部分。我也试过

For Each StringMatch As Match In Regex.Matches(StrippedContent, "\<.*\>")
    StrippedContent = StrippedContent.Replace(StringMatch.Value, "")
Next

Which did the same thing (returns what seems like a random part of the original string). Is there a better way to do this? By better I mean a way that works.

它做了同样的事情（返回看起来像是原始字符串的随机部分）。有一个更好的方法吗？我所说的更好是指一种有效的方式。

Answer 1

回答by Ro Yo Mi

Description

描述

This expression will:

这个表达式将：

find and replace all tags with nothing
avoid problematic edge cases

查找并替换所有标签
避免有问题的边缘情况

Regex: <(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>

正则表达式： <(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>

Replace with: nothing

替换为：无

enter image description here

在此处输入图片说明

Example

例子

Sample Text

示例文本

Note the difficult edge case in the mouse over function

请注意鼠标悬停功能中的困难边缘情况

these are <a onmouseover=' href="NotYourHref" ; if (6/a>3) { funRotator(href) } ; ' href=abc.aspx?filter=3&prefix=&num=11&suffix=>the droids</a> you are looking for.

Code

代码

Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim replacementstring as String = ""
    Dim matchpattern as String = "<(?:[^>=]|='[^']*'|=""[^""]*""|=[^'""][^\s>]*)*>"
    Console.Writeline(regex.Replace(sourcestring,matchpattern,replacementstring,RegexOptions.IgnoreCase OR RegexOptions.IgnorePatternWhitespace OR RegexOptions.Multiline OR RegexOptions.Singleline))
  End Sub
End Module

String after replacement

替换后的字符串

these are the droids you are looking for.

Answer 2

回答by y--

Well, this proves that you should always search Google for an answer. Here's a method I got from http://www.dotnetperls.com/remove-html-tags-vbnet

好吧，这证明您应该始终在 Google 上搜索答案。这是我从http://www.dotnetperls.com/remove-html-tags-vbnet获得的方法

Imports System.Text.RegularExpressions

Module Module1
    Sub Main()
        Dim html As String = "<p>There was a <b>.NET</b> programmer " +
          "and he stripped the <i>HTML</i> tags.</p>"
        Dim tagless As String = StripTags(html)
        Console.WriteLine(tagless)
    End Sub
    Function StripTags(ByVal html As String) As String
        Return Regex.Replace(html, "<.*?>", "")
    End Function
End Module

Answer 3

回答by mikro

Here's a simple function using the regex pattern that Ro Yo Mi posted.

这是一个使用 Ro Yo Mi 发布的正则表达式模式的简单函数。

<Extension()> Public Function RemoveHtmlTags(value As String) As String
    Return Regex.Replace(value, "<(?:[^>=]|='[^']*'|=""[^""]*""|=[^'""][^\s>]*)*>", "")
End Function

Demonstration:

示范：

Dim html As String = "This <i>is</i> just a <b>demo</b>.".RemoveHtmlTags()
Console.WriteLine(html)

vb.net 去除字符串中的html标签

提问by y--

回答by Ro Yo Mi

Description

描述

Example

例子

回答by y--

回答by mikro

相关推荐

最近更新

标签

vb.net 去除字符串中的html标签

提问by y--

回答by Ro Yo Mi

Description

描述

Example

例子

回答by y--

回答by mikro

相关推荐

vb.net 使用“Shell”运行带参数的可执行文件

vb.net 向winform添加过渡效果？

vb.net 检查excel范围以查看它是否为空

vb.net CellEndEdit 后的 DataGridView SetFocus

相关推荐

最近更新

标签