php 使用 xpath 和 DOMDocument 检索元素
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12547356/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Retrieve elements with xpath and DOMDocument
提问by user1691355
I have a list of ads in the html code below. What I need is a PHP loop to get the folowing elements for each ad:
我在下面的 html 代码中有一个广告列表。我需要的是一个 PHP 循环来获取每个广告的以下元素:
- ad URL (href attribute of
<a>tag) - ad image URL (src attribute of
<img>tag) - ad title (html content of
<div class="title">tag)
- 广告网址(
<a>标签的href 属性) - 广告图片网址(
<img>标签的src 属性) - 广告标题(
<div class="title">标签的html 内容)
<div class="ads">
<a href="http://path/to/ad/1">
<div class="ad">
<div class="image">
<div class="wrapper">
<img src="http://path/to/ad/1/image.jpg">
</div>
</div>
<div class="detail">
<div class="title">Ad #1</div>
</div>
</div>
</a>
<a href="http://path/to/ad/2">
<div class="ad">
<div class="image">
<div class="wrapper">
<img src="http://path/to/ad/2/image.jpg">
</div>
</div>
<div class="detail">
<div class="title">Ad #2</div>
</div>
</div>
</a>
</div>
I managed to get the ad URL with the PHP code below.
我设法使用下面的 PHP 代码获取了广告网址。
$d = new DOMDocument();
$d->loadHTML($ads); // the variable $ads contains the HTML code above
$xpath = new DOMXPath($d);
$ls_ads = $xpath->query('//a');
foreach ($ls_ads as $ad) {
$ad_url = $ad->getAttribute('href');
print("AD URL : $ad_url");
}
But I didn't manage to get the 2 other elements (image url and title). Any idea?
但我没有设法获得其他 2 个元素(图片网址和标题)。任何的想法?
回答by user1691355
I managed to get what I need with this code (based on Khue Vu's code) :
我设法用这段代码得到了我需要的东西(基于 Khue Vu 的代码):
$d = new DOMDocument();
$d->loadHTML($ads); // the variable $ads contains the HTML code above
$xpath = new DOMXPath($d);
$ls_ads = $xpath->query('//a');
foreach ($ls_ads as $ad) {
// get ad url
$ad_url = $ad->getAttribute('href');
// set current ad object as new DOMDocument object so we can parse it
$ad_Doc = new DOMDocument();
$cloned = $ad->cloneNode(TRUE);
$ad_Doc->appendChild($ad_Doc->importNode($cloned, True));
$xpath = new DOMXPath($ad_Doc);
// get ad title
$ad_title_tag = $xpath->query("//div[@class='title']");
$ad_title = trim($ad_title_tag->item(0)->nodeValue);
// get ad image
$ad_image_tag = $xpath->query("//img/@src");
$ad_image = $ad_image_tag->item(0)->nodeValue;
}
回答by Khue Vu
for other elements, you just do the same:
对于其他元素,您只需执行相同的操作:
foreach ($ls_ads as $ad) {
$ad_url = $ad->getAttribute('href');
print("AD URL : $ad_url");
$ad_Doc = new DOMDocument();
$ad_Doc->documentElement->appendChild($ad_Doc->importNode($ad));
$xpath = new DOMXPath($ad_Doc);
$img_src = $xpath->query("//img[@src]");
$title = $xpath->query("//div[@class='title']");
}

