php Preg_match_all <a href
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1519696/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Preg_match_all <a href
提问by streetparade
Hello i want to extract links
<a href="/portal/clients/show/entityId/2121" >and i want a regex which givs me /portal/clients/show/entityId/2121
the number at last 2121 is in other links different
any idea?
你好,我想提取链接
<a href="/portal/clients/show/entityId/2121" >,我想要一个正则表达式,它给我 /portal/clients/show/entityId/2121 最后的数字 2121 在其他链接中不同,知道吗?
采纳答案by Yacoby
Regex for parsing links is something like this:
解析链接的正则表达式是这样的:
'/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'
Given how horrible that is, I would recommend using Simple HTML Domfor getting the links at least. You could then check links using some very basic regex on the link href.
鉴于这有多可怕,我建议至少使用Simple HTML Dom来获取链接。然后,您可以在链接 href 上使用一些非常基本的正则表达式来检查链接。
回答by karim79
Simple PHP HTML Dom Parserexample:
// Create DOM from string
$html = str_get_html($links);
//or
$html = file_get_html('www.example.com');
foreach($html->find('a') as $link) {
echo $link->href . '<br />';
}
回答by soulmerge
Don't use regular expressions for proccessing xml/html. This can be done very easily using the builtin dom parser:
不要使用正则表达式来处理 xml/html。这可以使用内置的 dom 解析器很容易地完成:
$doc = new DOMDocument();
$doc->loadHTML($htmlAsString);
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//a/@href');
for ($i = 0; $i < $nodeList->length; $i++) {
# Xpath query for attributes gives a NodeList containing DOMAttr objects.
# http://php.net/manual/en/class.domattr.php
echo $nodeList->item($i)->value . "<br/>\n";
}
回答by streetparade
This is my solution:
这是我的解决方案:
<?php
// get links
$website = file_get_contents("http://www.example.com"); // download contents of www.example.com
preg_match_all("<a href=\x22(.+?)\x22>", $website, $matches); // save all links \x22 = "
// delete redundant parts
$matches = str_replace("a href=", "", $matches); // remove a href=
$matches = str_replace("\"", "", $matches); // remove "
// output all matches
print_r($matches[1]);
?>
I recommend to avoid using xml-based parsers, because you will not always know, whether the document/website has been well formed.
我建议避免使用基于 xml 的解析器,因为您不会总是知道文档/网站是否格式良好。
Best regards
此致
回答by Max
When "parsing" html I mostly rely on PHPQuery: http://code.google.com/p/phpquery/rather then regex.
在“解析”html 时,我主要依赖 PHPQuery:http: //code.google.com/p/phpquery/而不是正则表达式。
回答by Bart Kiers
Paring links from HTML can be done using am HTML parser.
可以使用 am HTML 解析器完成来自 HTML 的配对链接。
When you have all links, simple get the index of the last forward slash, and you have your number. No regex needed.
当您拥有所有链接时,只需获取最后一个正斜杠的索引,即可获得您的编号。不需要正则表达式。

