C# 正则表达式从 img 标签获取 src 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1058852/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regex to get src value from an img tag
提问by Tanmoy
I am using the following regex to get the src
value of the first img
tag in an HTML document.
我正在使用以下正则表达式来获取HTML 文档中src
第一个img
标签的值。
string match = "src=(?:\"|\')?(?<imgSrc>[^>]*[^/].(?:jpg|png))(?:\"|\')?"
Now it captures total src
attribute that I dont need. I just need the url inside the src
attribute. How to do it?
现在它捕获src
了我不需要的总属性。我只需要src
属性中的 url 。怎么做?
采纳答案by Welbog
Parse your HTML with something else.HTML is not regularand thus regular expressions aren't at all suited to parsing it.
用别的东西解析你的 HTML。HTML 不是正则的,因此正则表达式根本不适合解析它。
Use an HTML parser, or an XML parser if the HTML is strict. It's a lot easier to get the src attribute's value using XPath:
如果 HTML 是严格的,则使用 HTML 解析器或 XML 解析器。使用 XPath 获取 src 属性的值要容易得多:
//img/@src
XML parsing is built into the System.Xml
namespace. It's incredibly powerful. HTML parsingis a bit more difficult if the HTML isn't strict, but there are lots of libraries around that will do it for you.
XML 解析内置于System.Xml
命名空间中。它非常强大。如果 HTML 不严格,HTML 解析会更困难一些,但是有很多库可以为您完成。
回答by Edward Q. Bridges
Your regex should (in english) match on any character after a quote, that is not a quote inside an tag on the src attribute.
您的正则表达式应该(英文)匹配引号后的任何字符,这不是 src 属性标签内的引号。
In perl regex, it would be like this:
在 perl 正则表达式中,它会是这样的:
/src=[\"\']([^\"\']+)/
The URL will be in $1
after running this.
URL 将在$1
运行后出现。
Of course, this assumes that the urls in your src attributes are quoted. You can modify the values in the []
brackets accordingly if they are not.
当然,这假设您的 src 属性中的 url 被引用。[]
如果不是,您可以相应地修改括号中的值。
回答by Ian Ringrose
see When not to use Regex in C# (or Java, C++ etc)and Looking for C# HTML parser
请参阅何时不在 C#(或 Java、C++ 等)中使用 Regex和寻找 C# HTML 解析器
PS, how can I put a link to a StackOverflow question in a comment?
PS,如何在评论中添加指向 StackOverflow 问题的链接?