C# 获取 HTML 元素的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13234394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 07:51:15  来源:igfitidea点击:

Get the value of an HTML element

c#regex

提问by disasterkid

I have the HTML code of a webpage in a text file. I'd like my program to return the value that is in a tag. E.g. I want to get "Julius" out of

我在文本文件中有网页的 HTML 代码。我希望我的程序返回标签中的值。例如我想把“朱利叶斯”从

<span class="hidden first">Julius</span>

Do I need regular expression for this? Otherwise what is a string function that can do it?

我需要正则表达式吗?否则什么是可以做到这一点的字符串函数?

采纳答案by Anirudha

You should be using an html parser like htmlagilitypack.Regex is not a good choice for parsing HTML files as HTML is not strict nor is it regular with its format.

您应该使用htmlagilitypack 之类的 html 解析器。Regex不是解析 HTML 文件的好选择,因为 HTML 并不严格,也不是常规格式。

You can use below code to retrieve it using HtmlAgilityPack

您可以使用下面的代码来检索它 HtmlAgilityPack

HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);

var itemList = doc.DocumentNode.SelectNodes("//span[@class='hidden first']")//this xpath selects all span tag having its class as hidden first
                  .Select(p => p.InnerText)
                  .ToList();

//itemList now contain all the span tags content having its class as hidden first

回答by Pablo Santa Cruz

I would use the Html Agility Packto parse the HTML in C#.

我会使用Html Agility Pack来解析 C# 中的 HTML。

回答by KingCronus

I'd strongly recommend you look into something like the HTML Agility Pack

我强烈建议您查看HTML Agility Pack 之类的内容

回答by user1570048

i've asked the same question few days ago and ened up using HTML Agility Pack, but here is the regular expressions that you want

几天前我问过同样的问题并最终使用了 HTML Agility Pack,但这里是您想要的正则表达式

this one will ignore the attributes

这将忽略属性

<span[^>]*>(.*?)</span>

this one will consider the attributes

这个会考虑属性

<span class="hidden first"[^>]*>(.*?)</span>