在 C# 字符串中搜索特定文本的 HTML 并标记文本的最佳方法是什么？

Question

提问by Yttrium

What would be the best way to search through HTML inside a C# string variable to find a specific word/phrase and mark (or wrap) that word/phrase with a highlight?

在 C# 字符串变量中搜索 HTML 以查找特定单词/短语并用突出显示标记（或换行）该单词/短语的最佳方法是什么？

Thanks,

谢谢，

Jeff

杰夫

Answer 1

回答by Eddie Parker

Regular Expression would be my way. ;)

正则表达式将是我的方式。;)

Answer 2

回答by Greg Leaver

Searching for strings, you'll want to look up regular expressions. As for marking it, once you have the position of the substring it should be simple enough to use that to add in something to wrap around the phrase.

搜索字符串时，您需要查找正则表达式。至于标记它，一旦你有了子字符串的位置，它应该足够简单，可以使用它来添加一些东西来环绕短语。

Answer 3

回答by MrTelly

If the HTML you're using XHTML compliant, you could load it as an XML document, and then use XPath/XSL - long winded but kind of elegant?

如果您使用的 HTML 与 XHTML 兼容，您可以将其作为 XML 文档加载，然后使用 XPath/XSL - 冗长但有点优雅？

An approach I used in the past is to use HTMLTidyto convert messy HTML to XHTML, and then use XSL/XPath for screen scraping content into a database, to create a reverse content management system.

我过去使用的一种方法是使用HTMLTidy将凌乱的 HTML 转换为 XHTML，然后使用 XSL/XPath 将屏幕抓取内容放入数据库，以创建反向内容管理系统。

Regular expressions would do it, but could be complicated once you try stripping out tags, image names etc, to remove false positives.

正则表达式可以做到这一点，但是一旦您尝试去除标签、图像名称等以消除误报，就会变得复杂。

Answer 4

回答by Gorkem Pacaci

In simple cases, regular expressions will do.

在简单的情况下，正则表达式就可以了。

string input = "ttttttgottttttt";
string output = Regex.Replace(input, "go", "$0");

字符串输入 = "ttttttgottttttt";
string output = Regex.Replace(input, "go", "$0");

will yield: "ttttttgottttttt"

将产生：“ttttttgottttttt”

But when you say HTML, if you're referring to final text rendered, that's a bit of a mess. Say you've got this HTML:

但是当你说 HTML 时，如果你指的是最终呈现的文本，那就有点乱了。假设你有这个 HTML：

Book

B好吧

To highlight the word 'Book', you would need the help of a proper HTML renderer. To simplify, one can first remove all tags and leave only contents, and then do the usual replace, but it doesn't feel right.

要突出显示“书”这个词，您需要适当的 HTML 渲染器的帮助。为简化起见，可以先删除所有标签，只留下内容，然后再进行通常的替换，但感觉不太对。

Answer 5

回答by Matthew Dresser

You could look at using Html DOM, an open source project on SourceForge.net. This way you could programmatically manipulate your text instead of relying regular expressions.

您可以考虑使用Html DOM，这是 SourceForge.net 上的一个开源项目。通过这种方式，您可以以编程方式操作文本而不是依赖正则表达式。

Answer 6

回答by Zen

I like using Html Agility Packvery easy to use, although there hasn't been much updates lately, it is still usable. For example grabbing all the links

我喜欢使用Html Agility Pack非常好用，虽然最近没有太多更新，但它仍然可用。例如抓取所有链接

HtmlWeb client = new HtmlWeb();
HtmlDocument doc = client.Load("http://yoururl.com");            
HtmlNodeCollection Nodes = doc.DocumentNode.SelectNodes("//a[@href]");         

foreach (var link in Nodes)
{                
    Console.WriteLine(link.Attributes["href"].Value);
}

在 C# 字符串中搜索特定文本的 HTML 并标记文本的最佳方法是什么？

提问by Yttrium

回答by Eddie Parker

回答by Greg Leaver

回答by MrTelly

回答by Gorkem Pacaci

回答by Matthew Dresser

回答by Zen

相关推荐

最近更新

标签

在 C# 字符串中搜索特定文本的 HTML 并标记文本的最佳方法是什么？

提问by Yttrium

回答by Eddie Parker

回答by Greg Leaver

回答by MrTelly

回答by Gorkem Pacaci

回答by Matthew Dresser

回答by Zen

相关推荐

如何告诉 lambda 函数在 C# 中捕获副本而不是引用？

C# 如何将字符串长度转换为像素单位？

C# 获取 XElement 的 XPath？

C# ConfigurationManager.GetSection 无法加载文件或程序集

相关推荐

最近更新

标签