php 使用php获取html标签内的内容并在处理后替换它

Question

提问by esafwan

I have an html (sample.html) like this:

我有一个这样的 html (sample.html)：

<html>
<head>
</head>
<body>
<div id="content">
<!--content-->

<p>some content</p>

<!--content-->
</div>
</body>
</html>

How do i get the content part that is between the 2 html comment ''using php? I want to get that, do some processing and place it back, so i have to get and put! Is it possible?

如何''使用 php获取 2 个 html 注释之间的内容部分？我想得到那个，做一些处理然后放回去，所以我必须得到和放！是否可以？

Answer 1

回答by jim tollan

esafwan - you could use a regex expression to extract the content between the div (of a certain id).

esafwan - 您可以使用正则表达式来提取 div（特定 id）之间的内容。

I've done this for image tags before, so the same rules apply. i'll look out the code and update the message in a bit.

我以前为图像标签做过这个，所以同样的规则适用。我会查看代码并稍后更新消息。

[update]try this:

[更新]试试这个：

<?php
    function get_tag( $attr, $value, $xml ) {

        $attr = preg_quote($attr);
        $value = preg_quote($value);

        $tag_regex = '/<div[^>]*'.$attr.'="'.$value.'">(.*?)<\/div>/si';

        preg_match($tag_regex,
        $xml,
        $matches);
        return $matches[1];
    }

    $yourentirehtml = file_get_contents("test.html");
    $extract = get_tag('id', 'content', $yourentirehtml);
    echo $extract;
?>

or more simply:

或更简单地说：

preg_match("/<div[^>]*id=\"content\">(.*?)<\/div>/si", $text, $match);
$content = $match[1];

jim

吉姆

Answer 2

回答by Gordon

If this is a simplereplacement that does not involve parsing of the actual HTML document, you may use a Regular Expression or even just str_replacefor this. But generally, it is not a advisable to use Regex for HTMLbecause HTML is not regular and coming up with reliable patterns can quickly become a nightmare.

如果这是一个不涉及解析实际 HTML 文档的简单替换，您可以使用正则表达式，甚至仅用str_replace于此目的。但一般来说，不建议将 Regex 用于 HTML，因为 HTML 不是常规的，并且提出可靠的模式很快就会变成一场噩梦。

The right way to parse HTML in PHPis to use a parsing library that actually knows how to make sense of HTML documents. Your best native bet would be DOMbut PHP has a number of other native XML extensionsyou can use and there is also a number of third party libraries like phpQuery, Zend_Dom, QueryPathand FluentDom.

在 PHP 中解析 HTML的正确方法是使用真正知道如何理解 HTML 文档的解析库。您最好的原生选择是DOM，但 PHP 有许多其他您可以使用的原生 XML 扩展，还有许多第三方库，如phpQuery、Zend_Dom、QueryPath和FluentDom。

If you use the search function, you will see that this topic has been covered extensivelyand you should have no problems finding examples that show how to solve your question.

如果您使用搜索功能，您将看到该主题已被广泛涵盖，您应该可以轻松找到说明如何解决您的问题的示例。

Answer 3

回答by Ankur Mukherjee

<?php

    $content=file_get_contents("sample.html");
    $comment=explode("<!--content-->",$content);
    $comment=explode("<!--content-->",$comment[1]);
    var_dump(strip_tags($comment[0]));
?>

check this ,it will work for you

检查这个，它会为你工作

Answer 4

回答by piernik

Problem is with nested divs I found solution here

问题是嵌套的 div 我在这里找到了解决方案

<?php // File: MatchAllDivMain.php
// Read html file to be processed into $data variable
$data = file_get_contents('test.html');
// Commented regex to extract contents from <div class="main">contents</div>
//  where "contents" may contain nested <div>s.
//  Regex uses PCRE's recursive (?1) sub expression syntax to recurs group 1
$pattern_long = '{           # recursive regex to capture contents of "main" DIV
<div\s+class="main"\s*>              # match the "main" class DIV opening tag
  (                                   # capture "main" DIV contents into 
    (?:                               # non-cap group for nesting * quantifier
      (?: (?!<div[^>]*>|</div>). )++  # possessively match all non-DIV tag chars
    |                                 # or 
      <div[^>]*>(?1)</div>            # recursively match nested <div>xyz</div>
    )*                                # loop however deep as necessary
  )                                   # end group 1 capture
</div>                               # match the "main" class DIV closing tag
}six';  // single-line (dot matches all), ignore case and free spacing modes ON

// short version of same regex
$pattern_short = '{<div\s+class="main"\s*>((?:(?:(?!<div[^>]*>|</div>).)++|<div[^>]*>(?    1)</div>)*)</div>}si';

$matchcount = preg_match_all($pattern_long, $data, $matches);
// $matchcount = preg_match_all($pattern_short, $data, $matches);
echo("<pre>\n");
if ($matchcount > 0) {
    echo("$matchcount matches found.\n");
//  print_r($matches);
    for($i = 0; $i < $matchcount; $i++) {
        echo("\nMatch #" . ($i + 1) . ":\n");
        echo($matches[1][$i]); // print 1st capture group for match number i
    }
} else {
    echo('No matches');
}
echo("\n</pre>");
?>

Answer 5

回答by Jake

Have a look here for a code example that means you can load a HTML document into SimpleXML http://blog.charlvn.com/2009/03/html-in-php-simplexml.html

在这里查看代码示例，这意味着您可以将 HTML 文档加载到 SimpleXML http://blog.charlvn.com/2009/03/html-in-php-simplexml.html

You can then treat it as a normal SimpleXMLobject.

然后，您可以将其视为普通的SimpleXML对象。

EDIT: This will only work if you want the content in a tag (e.g. between <div> and </div>)

编辑：这仅适用于您想要标签中的内容（例如在 <div> 和 </div> 之间）

php 使用php获取html标签内的内容并在处理后替换它

提问by esafwan

回答by jim tollan

回答by Gordon

回答by Ankur Mukherjee

回答by piernik

回答by Jake

相关推荐

最近更新

标签

php 使用php获取html标签内的内容并在处理后替换它

提问by esafwan

回答by jim tollan

回答by Gordon

回答by Ankur Mukherjee

回答by piernik

回答by Jake

相关推荐

PHP 斐波那契数列

php 将英文数字转换为阿拉伯数字

php Twig 模板中的“以”开头

PHP：获取方法的参数？

相关推荐

最近更新

标签