PHP Xpath:获取所有包含needle的href值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2392393/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 06:17:24  来源:igfitidea点击:

PHP Xpath : get all href values that contain needle

phpxpathhref

提问by MattW

Working with PHP Xpath trying to quickly pull certain links within a html page.

使用 PHP Xpath 尝试快速拉取 html 页面中的某些链接。

The following will find all href links on mypage.html: $nodes = $x->query("//a[@href]");

以下将找到 mypage.html 上的所有 href 链接: $nodes = $x->query("//a[@href]");

Whereas the following will find all href links where the descriptionmatches my needle: $nodes = $x->query("//a[contains(@href,'click me')]");

而以下将找到描述与我的针匹配的所有 href 链接: $nodes = $x->query("//a[contains(@href,'click me')]");

What I am trying to achieve is matching on the href itself, more specific finding url's that contain certain parameters. Is that possible within a Xpath query or should I just start manipulating the output from the first Xpath query?

我想要实现的是匹配 href 本身,更具体的查找包含某些参数的 url。这在 Xpath 查询中是可能的,还是我应该开始操作第一个 Xpath 查询的输出?

回答by Gordon

Not sure I understand the question correctly, but the second XPath expression already does what you are describing. It does not match against the text node of the A element, but the href attribute:

不确定我是否正确理解了这个问题,但是第二个 XPath 表达式已经完成了您所描述的操作。它不匹配 A 元素的文本节点,而是匹配 href 属性:

$html = <<< HTML
<ul>
    <li>
        <a href="http://example.com/page?foo=bar">Description</a>
    </li>
    <li>
        <a href="http://example.com/page?lang=de">Description</a>
    </li>
</ul>
HTML;

$xml  = simplexml_load_string($html);
$list = $xml->xpath("//a[contains(@href,'foo')]");

Outputs:

输出:

array(1) {
  [0]=>
  object(SimpleXMLElement)#2 (2) {
    ["@attributes"]=>
    array(1) {
      ["href"]=>
      string(31) "http://example.com/page?foo=bar"
    }
    [0]=>
    string(11) "Description"
  }
}

As you can see, the returned NodeList contains only the A element with href containing foo (which I understand is what you are looking for). It contans the entire element, because the XPath translates to Fetch all A elements with href attribute containing foo. You would then access the attribute with

如您所见,返回的 NodeList 仅包含带有包含 foo 的 href 的 A 元素(我理解这就是您要查找的内容)。它包含整个元素,因为 XPath 转换为使用包含 foo 的 href 属性获取所有 A 元素。然后,您将访问该属性

echo $list[0]['href'] // gives "http://example.com/page?foo=bar"

If you only want to return the attribute itself, you'd have to do

如果您只想返回属性本身,则必须执行

//a[contains(@href,'foo')]/@href

Note that in SimpleXml, this would return a SimpleXml element though:

请注意,在 SimpleXml 中,这将返回一个 SimpleXml 元素:

array(1) {
  [0]=>
  object(SimpleXMLElement)#3 (1) {
    ["@attributes"]=>
    array(1) {
      ["href"]=>
      string(31) "http://example.com/page?foo=bar"
    }
  }
}

but you can output the URL now by

但您现在可以通过以下方式输出 URL

echo $list[0] // gives "http://example.com/page?foo=bar"