php 从 html 标签中删除所有属性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3026096/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove all attributes from an html tag
提问by Andres SK
i have this html code:
我有这个 html 代码:
<p style="padding:0px;">
<strong style="padding:0;margin:0;">hello</strong>
</p>
but it should become (for all possible html tags):
但它应该变成(对于所有可能的 html 标签):
<p>
<strong>hello</strong>
</p>
回答by gnarf
Adapted from my answer on a similar question
$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';
echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/si",'<>', $text);
// <p><strong>hello</strong></p>
The RegExp broken down:
RegExp 分解:
/ # Start Pattern
< # Match '<' at beginning of tags
( # Start Capture Group - Tag Name
[a-z] # Match 'a' through 'z'
[a-z0-9]* # Match 'a' through 'z' or '0' through '9' zero or more times
) # End Capture Group
[^>]*? # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
(\/?) # Capture Group - '/' if it is there
> # Match '>'
/is # End Pattern - Case Insensitive & Multi-line ability
Add some quoting, and use the replacement text <$1$2>it should strip any text after the tagname until the end of tag />or just >.
添加一些引用,并使用替换文本,<$1$2>它应该删除标记名之后的任何文本,直到标记结束/>或只是>.
Please NoteThis isn't necessarily going to work on ALLinput, as the Anti-HTML + RegExp will tell you. There are a few fallbacks, most notably <p style=">">would end up <p>">and a few other broken issues... I would recommend looking at Zend_Filter_StripTagsas a more full proof tags/attributes filter in PHP
请注意这不一定适用于所有输入,因为 Anti-HTML + RegExp 会告诉您。有一些后备,最值得注意的是<p style=">">最终会<p>">和其他一些损坏的问题......我建议将Zend_Filter_StripTags视为 PHP 中更完整的证明标签/属性过滤器
回答by Gordon
Here is how to do it with native DOM:
以下是如何使用本机 DOM 执行此操作:
$dom = new DOMDocument; // init new DOMDocument
$dom->loadHTML($html); // load HTML into it
$xpath = new DOMXPath($dom); // create a new XPath
$nodes = $xpath->query('//*[@style]'); // Find elements with a style attribute
foreach ($nodes as $node) { // Iterate over found elements
$node->removeAttribute('style'); // Remove style attribute
}
echo $dom->saveHTML(); // output cleaned HTML
If you want to remove all possible attributes from all possible tags, do
如果要从所有可能的标签中删除所有可能的属性,请执行
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
$node->parentNode->removeAttribute($node->nodeName);
}
echo $dom->saveHTML();
回答by Yacoby
I would avoid using regex as HTML is not a regular language and instead use a html parser like Simple HTML DOM
我会避免使用正则表达式,因为 HTML 不是常规语言,而是使用像Simple HTML DOM这样的 html 解析器
You can get a list of attributes that the object has by using attr. For example:
您可以使用 获取对象具有的属性列表attr。例如:
$html = str_get_html('<div id="hello">World</div>');
var_dump($html->find("div", 0)->attr); /
/*
array(1) {
["id"]=>
string(5) "hello"
}
*/
foreach ( $html->find("div", 0)->attr as &$value ){
$value = null;
}
print $html
//<div>World</div>
回答by TobiasDeVil
$html_text = '<p>Hello <b onclick="alert(123)" style="color: red">world</b>. <i>Its beautiful day.</i></p>';
$strip_text = strip_tags($html_text, '<b>');
$result = preg_replace('/<(\w+)[^>]*>/', '<>', $strip_text);
echo $result;
// Result
string 'Hello <b>world</b>. Its beautiful day.'
回答by Sp4cecat
To do SPECIFICALLY what andufo wants, it's simply:
要特别做 andufo 想要的,它很简单:
$html = preg_replace( "#(<[a-zA-Z0-9]+)[^\>]+>#", "\1>", $html );
That is, he wants to strip anything but the tag name out of the opening tag. It won't work for self-closing tags of course.
也就是说,他想从开始标签中去除标签名称以外的任何内容。当然,它不适用于自闭合标签。
回答by Greg K
Regex's are too fragile for HTML parsing. In your example, the following would strip out your attributes:
正则表达式对于 HTML 解析来说太脆弱了。在您的示例中,以下内容将删除您的属性:
echo preg_replace(
"|<(\w+)([^>/]+)?|",
"<",
"<p style=\"padding:0px;\">\n<strong style=\"padding:0;margin:0;\">hello</strong>\n</p>\n"
);
Update
更新
Make to second capture optional and do not strip '/' from closing tags:
进行第二次捕获是可选的,并且不要从结束标签中删除“/”:
|<(\w+)([^>]+)|to |<(\w+)([^>/]+)?|
|<(\w+)([^>]+)|到 |<(\w+)([^>/]+)?|
Demonstrate this regular expression works:
演示这个正则表达式的工作原理:
$ phpsh
Starting php
type 'h' or 'help' to see instructions & features
php> $html = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello<br/></strong></p>';
php> echo preg_replace("|<(\w+)([^>/]+)?|", "<", $html);
<p><strong>hello</strong><br/></p>
php> $html = '<strong>hello</strong>';
php> echo preg_replace("|<(\w+)([^>/]+)?|", "<", $html);
<strong>hello</strong>
回答by Brandon Orth
Hope this helps. It may not be the fastest way to do it, especially for large blocks of html. If anyone has any suggestions as to make this faster, let me know.
希望这可以帮助。这可能不是最快的方法,尤其是对于大块的 html。如果有人有任何建议以加快速度,请告诉我。
function StringEx($str, $start, $end)
{
$str_low = strtolower($str);
$pos_start = strpos($str_low, $start);
$pos_end = strpos($str_low, $end, ($pos_start + strlen($start)));
if($pos_end==0) return false;
if ( ($pos_start !== false) && ($pos_end !== false) )
{
$pos1 = $pos_start + strlen($start);
$pos2 = $pos_end - $pos1;
$RData = substr($str, $pos1, $pos2);
if($RData=='') { return true; }
return $RData;
}
return false;
}
$S = '<'; $E = '>'; while($RData=StringEx($DATA, $S, $E)) { if($RData==true) {$RData='';} $DATA = str_ireplace($S.$RData.$E, '||||||', $DATA); } $DATA = str_ireplace('||||||', $S.$E, $DATA);
回答by Tizón
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
回答by Greg Randall
Here's an easy way to get rid of attributes. It handles malformed html pretty well.
这是摆脱属性的简单方法。它可以很好地处理格式错误的 html。
<?php
$string = '<p style="padding:0px;">
<strong style="padding:0;margin:0;">hello</strong>
</p>';
//get all html elements on a line by themselves
$string_html_on_lines = str_replace (array("<",">"),array("\n<",">\n"),$string);
//find lines starting with a '<' and any letters or numbers upto the first space. throw everything after the space away.
$string_attribute_free = preg_replace("/\n(<[\w123456]+)\s.+/i","\n>",$string_html_on_lines);
echo $string_attribute_free;
?>

