C# 获取 HTML 元素的值

Question

提问by disasterkid

I have the HTML code of a webpage in a text file. I'd like my program to return the value that is in a tag. E.g. I want to get "Julius" out of

我在文本文件中有网页的 HTML 代码。我希望我的程序返回标签中的值。例如我想把“朱利叶斯”从

<span class="hidden first">Julius</span>

Do I need regular expression for this? Otherwise what is a string function that can do it?

我需要正则表达式吗？否则什么是可以做到这一点的字符串函数？

Answer 1

采纳答案by Anirudha

You should be using an html parser like htmlagilitypack.Regex is not a good choice for parsing HTML files as HTML is not strict nor is it regular with its format.

您应该使用htmlagilitypack 之类的 html 解析器。Regex不是解析 HTML 文件的好选择，因为 HTML 并不严格，也不是常规格式。

You can use below code to retrieve it using HtmlAgilityPack

您可以使用下面的代码来检索它 HtmlAgilityPack

HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);

var itemList = doc.DocumentNode.SelectNodes("//span[@class='hidden first']")//this xpath selects all span tag having its class as hidden first
                  .Select(p => p.InnerText)
                  .ToList();

//itemList now contain all the span tags content having its class as hidden first

Answer 2

回答by Pablo Santa Cruz

I would use the Html Agility Packto parse the HTML in C#.

我会使用Html Agility Pack来解析 C# 中的 HTML。

Answer 3

回答by KingCronus

I'd strongly recommend you look into something like the HTML Agility Pack

我强烈建议您查看HTML Agility Pack 之类的内容

Answer 4

回答by user1570048

i've asked the same question few days ago and ened up using HTML Agility Pack, but here is the regular expressions that you want

几天前我问过同样的问题并最终使用了 HTML Agility Pack，但这里是您想要的正则表达式

this one will ignore the attributes

这将忽略属性

<span[^>]*>(.*?)</span>

this one will consider the attributes

这个会考虑属性

<span class="hidden first"[^>]*>(.*?)</span>

C# 获取 HTML 元素的值

提问by disasterkid

采纳答案by Anirudha

回答by Pablo Santa Cruz

回答by KingCronus

回答by user1570048

相关推荐

最近更新

标签

C# 获取 HTML 元素的值

提问by disasterkid

采纳答案by Anirudha

回答by Pablo Santa Cruz

回答by KingCronus

回答by user1570048

相关推荐

特定于 .net C# 数据注释的 .NET 正则表达式

C# 使用 HttpClient 从 Web API 操作调用外部 HTTP 服务

C# 不能隐式转换类型“int？” 到'int'。

C# EF：包含 where 子句

相关推荐

最近更新

标签