php 从 html 标签中删除所有属性

Question

提问by Andres SK

i have this html code:

我有这个 html 代码：

<p style="padding:0px;">
<strong style="padding:0;margin:0;">hello</strong>
</p>

but it should become (for all possible html tags):

但它应该变成（对于所有可能的 html 标签）：

<p>
<strong>hello</strong>
</p>

Answer 1

回答by gnarf

Adapted from my answer on a similar question

改编自我对类似问题的回答

$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';

echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/si",'<>', $text);

// <p><strong>hello</strong></p>

The RegExp broken down:

RegExp 分解：

/              # Start Pattern
 <             # Match '<' at beginning of tags
 (             # Start Capture Group  - Tag Name
  [a-z]        # Match 'a' through 'z'
  [a-z0-9]*    # Match 'a' through 'z' or '0' through '9' zero or more times
 )             # End Capture Group
 [^>]*?        # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
 (\/?)         # Capture Group  - '/' if it is there
 >             # Match '>'
/is            # End Pattern - Case Insensitive & Multi-line ability

Add some quoting, and use the replacement text <$1$2>it should strip any text after the tagname until the end of tag />or just >.

添加一些引用，并使用替换文本，<$1$2>它应该删除标记名之后的任何文本，直到标记结束/>或只是>.

Please NoteThis isn't necessarily going to work on ALLinput, as the Anti-HTML + RegExp will tell you. There are a few fallbacks, most notably <p style=">">would end up <p>">and a few other broken issues... I would recommend looking at Zend_Filter_StripTagsas a more full proof tags/attributes filter in PHP

请注意这不一定适用于所有输入，因为 Anti-HTML + RegExp 会告诉您。有一些后备，最值得注意的是<p style=">">最终会<p>">和其他一些损坏的问题......我建议将Zend_Filter_StripTags视为 PHP 中更完整的证明标签/属性过滤器

Answer 2

回答by Gordon

Here is how to do it with native DOM:

以下是如何使用本机 DOM 执行此操作：

$dom = new DOMDocument;                 // init new DOMDocument
$dom->loadHTML($html);                  // load HTML into it
$xpath = new DOMXPath($dom);            // create a new XPath
$nodes = $xpath->query('//*[@style]');  // Find elements with a style attribute
foreach ($nodes as $node) {              // Iterate over found elements
    $node->removeAttribute('style');    // Remove style attribute
}
echo $dom->saveHTML();                  // output cleaned HTML

If you want to remove all possible attributes from all possible tags, do

如果要从所有可能的标签中删除所有可能的属性，请执行

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
    $node->parentNode->removeAttribute($node->nodeName);
}
echo $dom->saveHTML();

Answer 3

回答by Yacoby

I would avoid using regex as HTML is not a regular language and instead use a html parser like Simple HTML DOM

我会避免使用正则表达式，因为 HTML 不是常规语言，而是使用像Simple HTML DOM这样的 html 解析器

You can get a list of attributes that the object has by using attr. For example:

您可以使用获取对象具有的属性列表attr。例如：

$html = str_get_html('<div id="hello">World</div>');
var_dump($html->find("div", 0)->attr); /
/*
array(1) {
  ["id"]=>
  string(5) "hello"
}
*/

foreach ( $html->find("div", 0)->attr as &$value ){
    $value = null;
}

print $html
//<div>World</div>

Answer 4

回答by TobiasDeVil

$html_text = '<p>Hello <b onclick="alert(123)" style="color: red">world</b>. <i>Its beautiful day.</i></p>';
$strip_text = strip_tags($html_text, '<b>');
$result = preg_replace('/<(\w+)[^>]*>/', '<>', $strip_text);
echo $result;

// Result
string 'Hello <b>world</b>. Its beautiful day.'

Answer 5

回答by Sp4cecat

To do SPECIFICALLY what andufo wants, it's simply:

要特别做 andufo 想要的，它很简单：

$html = preg_replace( "#(<[a-zA-Z0-9]+)[^\>]+>#", "\1>", $html );

That is, he wants to strip anything but the tag name out of the opening tag. It won't work for self-closing tags of course.

也就是说，他想从开始标签中去除标签名称以外的任何内容。当然，它不适用于自闭合标签。

Answer 6

回答by Greg K

Regex's are too fragile for HTML parsing. In your example, the following would strip out your attributes:

正则表达式对于 HTML 解析来说太脆弱了。在您的示例中，以下内容将删除您的属性：

echo preg_replace(
    "|<(\w+)([^>/]+)?|",
    "<",
    "<p style=\"padding:0px;\">\n<strong style=\"padding:0;margin:0;\">hello</strong>\n</p>\n"
);

Update

更新

Make to second capture optional and do not strip '/' from closing tags:

进行第二次捕获是可选的，并且不要从结束标签中删除“/”：

|<(\w+)([^>]+)|to |<(\w+)([^>/]+)?|

|<(\w+)([^>]+)|到 |<(\w+)([^>/]+)?|

Demonstrate this regular expression works:

演示这个正则表达式的工作原理：

$ phpsh
Starting php
type 'h' or 'help' to see instructions & features
php> $html = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello<br/></strong></p>';
php> echo preg_replace("|<(\w+)([^>/]+)?|", "<", $html);
<p><strong>hello</strong><br/></p>
php> $html = '<strong>hello</strong>';
php> echo preg_replace("|<(\w+)([^>/]+)?|", "<", $html);
<strong>hello</strong>

Answer 7

回答by Brandon Orth

Hope this helps. It may not be the fastest way to do it, especially for large blocks of html. If anyone has any suggestions as to make this faster, let me know.

希望这可以帮助。这可能不是最快的方法，尤其是对于大块的 html。如果有人有任何建议以加快速度，请告诉我。

function StringEx($str, $start, $end)
{ 
    $str_low = strtolower($str);
    $pos_start = strpos($str_low, $start);
    $pos_end = strpos($str_low, $end, ($pos_start + strlen($start)));
    if($pos_end==0) return false;
    if ( ($pos_start !== false) && ($pos_end !== false) )
    {  
        $pos1 = $pos_start + strlen($start);
        $pos2 = $pos_end - $pos1;
        $RData = substr($str, $pos1, $pos2);
        if($RData=='') { return true; }
        return $RData;
    } 
    return false;
}

$S = '<'; $E = '>'; while($RData=StringEx($DATA, $S, $E)) { if($RData==true) {$RData='';} $DATA = str_ireplace($S.$RData.$E, '||||||', $DATA); } $DATA = str_ireplace('||||||', $S.$E, $DATA);

Answer 8

回答by Tizón

<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>

Answer 9

回答by Greg Randall

Here's an easy way to get rid of attributes. It handles malformed html pretty well.

这是摆脱属性的简单方法。它可以很好地处理格式错误的 html。

<?php
  $string = '<p style="padding:0px;">
    <strong style="padding:0;margin:0;">hello</strong>
    </p>';

  //get all html elements on a line by themselves
  $string_html_on_lines = str_replace (array("<",">"),array("\n<",">\n"),$string); 

  //find lines starting with a '<' and any letters or numbers upto the first space. throw everything after the space away.
  $string_attribute_free = preg_replace("/\n(<[\w123456]+)\s.+/i","\n>",$string_html_on_lines);

  echo $string_attribute_free;
?>

php 从 html 标签中删除所有属性

提问by Andres SK

回答by gnarf

回答by Gordon

回答by Yacoby

回答by TobiasDeVil

回答by Sp4cecat

回答by Greg K

回答by Brandon Orth

回答by Tizón

回答by Greg Randall

相关推荐

最近更新

标签

php 从 html 标签中删除所有属性

提问by Andres SK

回答by gnarf

回答by Gordon

回答by Yacoby

回答by TobiasDeVil

回答by Sp4cecat

回答by Greg K

回答by Brandon Orth

回答by Tizón

回答by Greg Randall

相关推荐

使用 PHP cURL 登录远程站点

PHP / MySQL - 如何防止两个请求 *Update

如何向 PHP 中的现有类添加方法？

php CodeIgniter 中的 PHPExcel 错误“无法加载请求的类：iofactory”

相关推荐

最近更新

标签