C# 正则表达式从 img 标签获取 src 值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1058852/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 07:13:16  来源:igfitidea点击:

Regex to get src value from an img tag

c#htmlregex

提问by Tanmoy

I am using the following regex to get the srcvalue of the first imgtag in an HTML document.

我正在使用以下正则表达式来获取HTML 文档中src第一个img标签的值。

string match = "src=(?:\"|\')?(?<imgSrc>[^>]*[^/].(?:jpg|png))(?:\"|\')?"

Now it captures total srcattribute that I dont need. I just need the url inside the srcattribute. How to do it?

现在它捕获src了我不需要的总属性。我只需要src属性中的 url 。怎么做?

采纳答案by Welbog

Parse your HTML with something else.HTML is not regularand thus regular expressions aren't at all suited to parsing it.

用别的东西解析你的 HTML。HTML 不是正则的,因此正则表达式根本不适合解析它。

Use an HTML parser, or an XML parser if the HTML is strict. It's a lot easier to get the src attribute's value using XPath:

如果 HTML 是严格的,则使用 HTML 解析器或 XML 解析器。使用 XPath 获取 src 属性的值要容易得多:

//img/@src

XML parsing is built into the System.Xmlnamespace. It's incredibly powerful. HTML parsingis a bit more difficult if the HTML isn't strict, but there are lots of libraries around that will do it for you.

XML 解析内置于System.Xml命名空间中。它非常强大。如果 HTML 不严格,HTML 解析会更困难一些,但是有很多库可以为您完成。

回答by Edward Q. Bridges

Your regex should (in english) match on any character after a quote, that is not a quote inside an tag on the src attribute.

您的正则表达式应该(英文)匹配引号后的任何字符,这不是 src 属性标签内的引​​号。

In perl regex, it would be like this:

在 perl 正则表达式中,它会是这样的:

/src=[\"\']([^\"\']+)/

The URL will be in $1after running this.

URL 将在$1运行后出现。

Of course, this assumes that the urls in your src attributes are quoted. You can modify the values in the []brackets accordingly if they are not.

当然,这假设您的 src 属性中的 url 被引用。[]如果不是,您可以相应地修改括号中的值。

回答by Ian Ringrose

see When not to use Regex in C# (or Java, C++ etc)and Looking for C# HTML parser

请参阅何时不在 C#(或 Java、C++ 等)中使用 Regex寻找 C# HTML 解析器

PS, how can I put a link to a StackOverflow question in a comment?

PS,如何在评论中添加指向 StackOverflow 问题的链接?