C# 获取 HTML 元素的值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13234394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get the value of an HTML element
提问by disasterkid
I have the HTML code of a webpage in a text file. I'd like my program to return the value that is in a tag. E.g. I want to get "Julius" out of
我在文本文件中有网页的 HTML 代码。我希望我的程序返回标签中的值。例如我想把“朱利叶斯”从
<span class="hidden first">Julius</span>
Do I need regular expression for this? Otherwise what is a string function that can do it?
我需要正则表达式吗?否则什么是可以做到这一点的字符串函数?
采纳答案by Anirudha
You should be using an html parser like htmlagilitypack.Regex is not a good choice for parsing HTML files as HTML is not strict nor is it regular with its format.
您应该使用htmlagilitypack 之类的 html 解析器。Regex不是解析 HTML 文件的好选择,因为 HTML 并不严格,也不是常规格式。
You can use below code to retrieve it using HtmlAgilityPack
您可以使用下面的代码来检索它 HtmlAgilityPack
HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);
var itemList = doc.DocumentNode.SelectNodes("//span[@class='hidden first']")//this xpath selects all span tag having its class as hidden first
.Select(p => p.InnerText)
.ToList();
//itemList now contain all the span tags content having its class as hidden first
回答by Pablo Santa Cruz
I would use the Html Agility Packto parse the HTML in C#.
我会使用Html Agility Pack来解析 C# 中的 HTML。
回答by KingCronus
I'd strongly recommend you look into something like the HTML Agility Pack
我强烈建议您查看HTML Agility Pack 之类的内容
回答by user1570048
i've asked the same question few days ago and ened up using HTML Agility Pack, but here is the regular expressions that you want
几天前我问过同样的问题并最终使用了 HTML Agility Pack,但这里是您想要的正则表达式
this one will ignore the attributes
这将忽略属性
<span[^>]*>(.*?)</span>
this one will consider the attributes
这个会考虑属性
<span class="hidden first"[^>]*>(.*?)</span>

