php 从 HTML 标签中删除样式属性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5517255/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove style attribute from HTML tags
提问by Martin Bean
I'm not too good with regular expressions, but with PHP I'm wanting to remove the style
attribute from HTML tags in a string that's coming back from TinyMCE.
我对正则表达式不太擅长,但是对于 PHP,我想style
从 TinyMCE 返回的字符串中的 HTML 标记中删除该属性。
So change <p style="...">Text</p>
to just vanilla <p>Test</p>
.
所以<p style="...">Text</p>
改为只是香草<p>Test</p>
。
How would I achieve this with something like the preg_replace()
function?
我将如何通过类似preg_replace()
功能实现这一目标?
回答by Staffan N?teberg
The pragmatic regex (<[^>]+) style=".*?"
will solve this problem in all reasonable cases. The part of the match that is not the first captured group should be removed, like this:
实用的正则表达式(<[^>]+) style=".*?"
将在所有合理的情况下解决这个问题。应该删除不是第一个捕获组的匹配部分,如下所示:
$output = preg_replace('/(<[^>]+) style=".*?"/i', '', $input);
Match a <
followed by one or more "not >
" until we come to space
and the the style="..."
part. The /i
makes it work even with STYLE="..."
. Replace this match with $1
, which is the captured group. It will leave the tag as is, if the tag doesn't include style="..."
.
匹配 a<
后跟一个或多个“not >
”,直到我们来到space
和style="..."
部分。该/i
更令其一起工作STYLE="..."
。将此匹配替换为$1
,即捕获的组。如果标签不包含style="..."
.
回答by Maerlyn
Something like this should work (untested code warning):
这样的事情应该可以工作(未经测试的代码警告):
<?php
$html = '<p style="asd">qwe</p><br /><p class="qwe">qweqweqwe</p>';
$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($html);
libxml_use_internal_errors(false);
$domx = new DOMXPath($domd);
$items = $domx->query("//p[@style]");
foreach($items as $item) {
$item->removeAttribute("style");
}
echo $domd->saveHTML();
回答by JaseC
I commented on @Mayerln 's function. It does work but DOMDocument really stuffs with encoding. Here's my simplehtmldom version
我评论了@Mayerln 的功能。它确实有效,但 DOMDocument 确实充满了编码。这是我的 simplehtmldom 版本
function stripAttributes($html,$attribs) {
$dom = new simple_html_dom();
$dom->load($html);
foreach($attribs as $attrib)
foreach($dom->find("*[$attrib]") as $e)
$e->$attrib = null;
$dom->load($dom->save());
return $dom->save();
}
回答by DreschF
I use this:
我用这个:
function strip_word_html($text, $allowed_tags = '<a><ul><li><b><i><sup><sub><em><strong><u><br><br/><br /><p><h2><h3><h4><h5><h6>')
{
mb_regex_encoding('UTF-8');
//replace MS special characters first
$search = array('/‘/u', '/’/u', '/“/u', '/”/u', '/—/u');
$replace = array('\'', '\'', '"', '"', '-');
$text = preg_replace($search, $replace, $text);
//make sure _all_ html entities are converted to the plain ascii equivalents - it appears
//in some MS headers, some html entities are encoded and some aren't
//$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
//try to strip out any C style comments first, since these, embedded in html comments, seem to
//prevent strip_tags from removing html comments (MS Word introduced combination)
if(mb_stripos($text, '/*') !== FALSE){
$text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');
}
//introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be
//'<1' becomes '< 1'(note: somewhat application specific)
$text = preg_replace(array('/<([0-9]+)/'), array('< '), $text);
$text = strip_tags($text, $allowed_tags);
//eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one
$text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);
//strip out inline css and simplify style tags
$search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu', '#<(em|i)[^>]*>(.*?)</(em|i)>#isu', '#<u[^>]*>(.*?)</u>#isu');
$replace = array('<b></b>', '<i></i>', '<u></u>');
$text = preg_replace($search, $replace, $text);
//on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears
//that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains
//some MS Style Definitions - this last bit gets rid of any leftover comments */
$num_matches = preg_match_all("/\<!--/u", $text, $matches);
if($num_matches){
$text = preg_replace('/\<!--(.)*--\>/isu', '', $text);
}
$text = preg_replace('/(<[^>]+) style=".*?"/i', '', $text);
return $text;
}
回答by Lorenzo Marcon
Here you go:
干得好:
<?php
$html = '<p style="border: 1px solid red;">Test</p>';
echo preg_replace('/<p style="(.+?)">(.+?)<\/p>/i', "<p></p>", $html);
?>
By the way, as pointed out by others, regex are not suggested for this.
顺便说一句,正如其他人所指出的,不建议为此使用正则表达式。
回答by RafaSashi
In addition to Lorenzo Marcon's answer:
除了洛伦佐·马孔的回答:
Using preg_replace
to select everything except style attribute:
使用preg_replace
来选择不同的样式属性的一切:
$html = preg_replace('/(<p.+?)style=".+?"(>.+?)/i', "", $html);
回答by Ashguard
I'm using such thing to clean-up the style='...' section out of tags with keeping of other attributes at the moment.
我正在使用这样的东西来清理标签中的 style='...' 部分,同时保留其他属性。
$output = preg_replace('/<([^>]+)(\sstyle=(?P<stq>["\'])(.*)\k<stq>)([^<]*)>/iUs', '<>', $input);
回答by Zmael
$html = preg_replace('/\sstyle=("|\').*?("|\')/i', '', $html);
For replacing all style="" with blank.
用于将所有 style="" 替换为空白。
回答by Daniel
You could handle it client side, the easiest would be with jQuery. Something like:
您可以在客户端处理它,最简单的方法是使用 jQuery。就像是:
$("#tinyMce p").removeAttr("style");