php Preg_match_all <a href

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1519696/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 02:54:52  来源:igfitidea点击:

Preg_match_all <a href

phppreg-matchhyperlink

提问by streetparade

Hello i want to extract links <a href="/portal/clients/show/entityId/2121" >and i want a regex which givs me /portal/clients/show/entityId/2121 the number at last 2121 is in other links different any idea?

你好,我想提取链接 <a href="/portal/clients/show/entityId/2121" >,我想要一个正则表达式,它给我 /portal/clients/show/entityId/2121 最后的数字 2121 在其他链接中不同,知道吗?

采纳答案by Yacoby

Regex for parsing links is something like this:

解析链接的正则表达式是这样的:

'/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'

Given how horrible that is, I would recommend using Simple HTML Domfor getting the links at least. You could then check links using some very basic regex on the link href.

鉴于这有多可怕,我建议至少使用Simple HTML Dom来获取链接。然后,您可以在链接 href 上使用一些非常基本的正则表达式来检查链接。

回答by karim79

Simple PHP HTML Dom Parserexample:

简单的 PHP HTML Dom 解析器示例:

// Create DOM from string
$html = str_get_html($links);

//or
$html = file_get_html('www.example.com');

foreach($html->find('a') as $link) {
    echo $link->href . '<br />';
}

回答by soulmerge

Don't use regular expressions for proccessing xml/html. This can be done very easily using the builtin dom parser:

不要使用正则表达式来处理 xml/html。这可以使用内置的 dom 解析器很容易地完成:

$doc = new DOMDocument();
$doc->loadHTML($htmlAsString);
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//a/@href');
for ($i = 0; $i < $nodeList->length; $i++) {
    # Xpath query for attributes gives a NodeList containing DOMAttr objects.
    # http://php.net/manual/en/class.domattr.php
    echo $nodeList->item($i)->value . "<br/>\n";
}

回答by streetparade

This is my solution:

这是我的解决方案:

<?php
// get links
$website = file_get_contents("http://www.example.com"); // download contents of www.example.com
preg_match_all("<a href=\x22(.+?)\x22>", $website, $matches); // save all links \x22 = "

// delete redundant parts
$matches = str_replace("a href=", "", $matches); // remove a href=
$matches = str_replace("\"", "", $matches); // remove "

// output all matches
print_r($matches[1]);
?>

I recommend to avoid using xml-based parsers, because you will not always know, whether the document/website has been well formed.

我建议避免使用基于 xml 的解析器,因为您不会总是知道文档/网站是否格式良好。

Best regards

此致

回答by Max

When "parsing" html I mostly rely on PHPQuery: http://code.google.com/p/phpquery/rather then regex.

在“解析”html 时,我主要依赖 PHPQuery:http: //code.google.com/p/phpquery/而不是正则表达式。

回答by Bart Kiers

Paring links from HTML can be done using am HTML parser.

可以使用 am HTML 解析器完成来自 HTML 的配对链接。

When you have all links, simple get the index of the last forward slash, and you have your number. No regex needed.

当您拥有所有链接时,只需获取最后一个正斜杠的索引,即可获得您的编号。不需要正则表达式。