使用 PHP 的 HTML DOMDocument 解析 HTML

Question

提问by Mint

I was trying to do it with "getElementsByTagName", but it wasn't working, I'm new to using DOMDocument to parse HTML, as I used to use regex until yesterday some kind fokes here told me that DOMEDocument would be better for the job, so I'm giving it a try :)

我试图用“getElementsByTagName”来做到这一点，但它没有用，我是使用 DOMDocument 解析 HTML 的新手，因为我以前一直使用正则表达式，直到昨天这里的一些朋友告诉我 DOMEDocument 对工作，所以我要试一试:)

I google around for a while looking for some explains but didn't find anything that helped (not with the class anyway)

我用谷歌搜索了一段时间寻找一些解释，但没有找到任何有帮助的东西（无论如何都不是课堂）

So I want to capture "Capture this text 1" and "Capture this text 2" and so on.

所以我想捕获“Capture this text 1”和“Capture this text 2”等等。

Doesn't look to hard, but I can't figure it out :(

看起来不难，但我想不通:(

<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>

Answer 1

回答by Pascal MARTIN

If you want to get :

如果你想得到：

The text
that's inside a <div>tag with class="text"
that's, itself, inside a <div>with class="main"

文本
这是在一个<div>标签内class="text"
那就是，它本身，在一个<div>withclass="main"

I would say the easiest way is not to use DOMDocument::getElementsByTagName-- which will return all tags that have a specific name (while you only want some of them).

我会说最简单的方法是不使用DOMDocument::getElementsByTagName——这将返回所有具有特定名称的标签（而您只需要其中的一些）。

Instead, I would use an XPath query on your document, using the DOMXpathclass.

相反，我会使用DOMXpath类对您的文档使用 XPath 查询。

For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpathclass :

例如，应该这样做，将 HTML 字符串加载到 DOM 对象中，并实例化DOMXpath类：

$html = <<<HTML
<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

And, then, you can use XPath queries, with the DOMXPath::querymethod, that returns the list of elements you were searching for :

然后，您可以使用 XPath 查询，该DOMXPath::query方法返回您正在搜索的元素列表：

$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
    var_dump(trim($tag->nodeValue));
}

And executing this gives me the following output :

执行这个给我以下输出：

string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)

Answer 2

回答by lokeshsk

You can use http://simplehtmldom.sourceforge.net/

您可以使用http://simplehtmldom.sourceforge.net/

It is very simple easy to use DOM parser written in php, by which you can easily fetch the content of div tag.

用php编写的DOM解析器非常简单易用，通过它您可以轻松获取div标签的内容。

Something like this:

像这样的东西：

// Find all <div> which have attribute id=text
$ret = $html->find('div[id=text]');

See the documentation of it for more help.

有关更多帮助，请参阅它的文档。

使用 PHP 的 HTML DOMDocument 解析 HTML

提问by Mint

回答by Pascal MARTIN

回答by lokeshsk

相关推荐

最近更新

标签

使用 PHP 的 HTML DOMDocument 解析 HTML

提问by Mint

回答by Pascal MARTIN

回答by lokeshsk

相关推荐

如何在 PHP 中回显 /（斜线）？

php PHP在创建新对象时传递参数，为对象调用call_user_func_array

PHP 中的异步函数调用

php Zend框架中重定向和转发有什么区别

相关推荐

最近更新

标签