使用 PHP 的 HTML DOMDocument 解析 HTML
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2571232/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parse HTML with PHP's HTML DOMDocument
提问by Mint
I was trying to do it with "getElementsByTagName", but it wasn't working, I'm new to using DOMDocument to parse HTML, as I used to use regex until yesterday some kind fokes here told me that DOMEDocument would be better for the job, so I'm giving it a try :)
我试图用“getElementsByTagName”来做到这一点,但它没有用,我是使用 DOMDocument 解析 HTML 的新手,因为我以前一直使用正则表达式,直到昨天这里的一些朋友告诉我 DOMEDocument 对工作,所以我要试一试:)
I google around for a while looking for some explains but didn't find anything that helped (not with the class anyway)
我用谷歌搜索了一段时间寻找一些解释,但没有找到任何有帮助的东西(无论如何都不是课堂)
So I want to capture "Capture this text 1" and "Capture this text 2" and so on.
所以我想捕获“Capture this text 1”和“Capture this text 2”等等。
Doesn't look to hard, but I can't figure it out :(
看起来不难,但我想不通:(
<div class="main">
<div class="text">
Capture this text 1
</div>
</div>
<div class="main">
<div class="text">
Capture this text 2
</div>
</div>
回答by Pascal MARTIN
If you want to get :
如果你想得到:
- The text
- that's inside a
<div>tag withclass="text" - that's, itself, inside a
<div>withclass="main"
- 文本
- 这是在一个
<div>标签内class="text" - 那就是,它本身,在一个
<div>withclass="main"
I would say the easiest way is not to use DOMDocument::getElementsByTagName-- which will return all tags that have a specific name (while you only want some of them).
我会说最简单的方法是不使用DOMDocument::getElementsByTagName——这将返回所有具有特定名称的标签(而您只需要其中的一些)。
Instead, I would use an XPath query on your document, using the DOMXpathclass.
相反,我会使用DOMXpath类对您的文档使用 XPath 查询。
For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpathclass :
例如,应该这样做,将 HTML 字符串加载到 DOM 对象中,并实例化DOMXpath类:
$html = <<<HTML
<div class="main">
<div class="text">
Capture this text 1
</div>
</div>
<div class="main">
<div class="text">
Capture this text 2
</div>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
And, then, you can use XPath queries, with the DOMXPath::querymethod, that returns the list of elements you were searching for :
然后,您可以使用 XPath 查询,该DOMXPath::query方法返回您正在搜索的元素列表:
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
var_dump(trim($tag->nodeValue));
}
And executing this gives me the following output :
执行这个给我以下输出:
string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)
回答by lokeshsk
You can use http://simplehtmldom.sourceforge.net/
您可以使用http://simplehtmldom.sourceforge.net/
It is very simple easy to use DOM parser written in php, by which you can easily fetch the content of div tag.
用php编写的DOM解析器非常简单易用,通过它您可以轻松获取div标签的内容。
Something like this:
像这样的东西:
// Find all <div> which have attribute id=text
$ret = $html->find('div[id=text]');
See the documentation of it for more help.
有关更多帮助,请参阅它的文档。

