php 从 HTML 标签中删除样式属性

Question

提问by Martin Bean

I'm not too good with regular expressions, but with PHP I'm wanting to remove the styleattribute from HTML tags in a string that's coming back from TinyMCE.

我对正则表达式不太擅长，但是对于 PHP，我想style从 TinyMCE 返回的字符串中的 HTML 标记中删除该属性。

So change Textto just vanilla Test.

所以Text改为只是香草Test。

How would I achieve this with something like the preg_replace()function?

我将如何通过类似preg_replace()功能实现这一目标？

Answer 1

回答by Staffan N?teberg

The pragmatic regex (<[^>]+) style=".*?"will solve this problem in all reasonable cases. The part of the match that is not the first captured group should be removed, like this:

实用的正则表达式(<[^>]+) style=".*?"将在所有合理的情况下解决这个问题。应该删除不是第一个捕获组的匹配部分，如下所示：

$output = preg_replace('/(<[^>]+) style=".*?"/i', '', $input);

Match a <followed by one or more "not >" until we come to spaceand the the style="..."part. The /imakes it work even with STYLE="...". Replace this match with $1, which is the captured group. It will leave the tag as is, if the tag doesn't include style="...".

匹配 a<后跟一个或多个“not >”，直到我们来到space和style="..."部分。该/i更令其一起工作STYLE="..."。将此匹配替换为$1，即捕获的组。如果标签不包含style="...".

Answer 2

回答by Maerlyn

Something like this should work (untested code warning):

这样的事情应该可以工作（未经测试的代码警告）：

<?php

$html = '<p style="asd">qwe</p><br /><p class="qwe">qweqweqwe</p>';

$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($html);
libxml_use_internal_errors(false);

$domx = new DOMXPath($domd);
$items = $domx->query("//p[@style]");

foreach($items as $item) {
  $item->removeAttribute("style");
}

echo $domd->saveHTML();

Answer 3

回答by JaseC

I commented on @Mayerln 's function. It does work but DOMDocument really stuffs with encoding. Here's my simplehtmldom version

我评论了@Mayerln 的功能。它确实有效，但 DOMDocument 确实充满了编码。这是我的 simplehtmldom 版本

function stripAttributes($html,$attribs) {
    $dom = new simple_html_dom();
    $dom->load($html);
    foreach($attribs as $attrib)
        foreach($dom->find("*[$attrib]") as $e)
            $e->$attrib = null; 
    $dom->load($dom->save());
    return $dom->save();
}

Answer 4

回答by DreschF

I use this:

我用这个：

function strip_word_html($text, $allowed_tags = '<a><ul><li><b><i><sup><sub><em><strong><u><br><br/><br /><p><h2><h3><h4><h5><h6>')
{
    mb_regex_encoding('UTF-8');
    //replace MS special characters first
    $search = array('/&lsquo;/u', '/&rsquo;/u', '/&ldquo;/u', '/&rdquo;/u', '/&mdash;/u');
    $replace = array('\'', '\'', '"', '"', '-');
    $text = preg_replace($search, $replace, $text);
    //make sure _all_ html entities are converted to the plain ascii equivalents - it appears
    //in some MS headers, some html entities are encoded and some aren't
    //$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
    //try to strip out any C style comments first, since these, embedded in html comments, seem to
    //prevent strip_tags from removing html comments (MS Word introduced combination)
    if(mb_stripos($text, '/*') !== FALSE){
        $text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');
    }
    //introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be
    //'<1' becomes '< 1'(note: somewhat application specific)
    $text = preg_replace(array('/<([0-9]+)/'), array('< '), $text);
    $text = strip_tags($text, $allowed_tags);
    //eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one
    $text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);
    //strip out inline css and simplify style tags
    $search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu', '#<(em|i)[^>]*>(.*?)</(em|i)>#isu', '#<u[^>]*>(.*?)</u>#isu');
    $replace = array('<b></b>', '<i></i>', '<u></u>');
    $text = preg_replace($search, $replace, $text);
    //on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears
    //that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains
    //some MS Style Definitions - this last bit gets rid of any leftover comments */
    $num_matches = preg_match_all("/\<!--/u", $text, $matches);
    if($num_matches){
        $text = preg_replace('/\<!--(.)*--\>/isu', '', $text);
    }
    $text = preg_replace('/(<[^>]+) style=".*?"/i', '', $text);
return $text;
}

Answer 5

回答by Lorenzo Marcon

Here you go:

干得好：

<?php

$html = '<p style="border: 1px solid red;">Test</p>';
echo preg_replace('/<p style="(.+?)">(.+?)<\/p>/i', "<p></p>", $html);

?>

By the way, as pointed out by others, regex are not suggested for this.

顺便说一句，正如其他人所指出的，不建议为此使用正则表达式。

Answer 6

回答by RafaSashi

In addition to Lorenzo Marcon's answer:

除了洛伦佐·马孔的回答：

Using preg_replaceto select everything except style attribute:

使用preg_replace来选择不同的样式属性的一切：

$html = preg_replace('/(<p.+?)style=".+?"(>.+?)/i', "", $html);

Answer 7

回答by Ashguard

I'm using such thing to clean-up the style='...' section out of tags with keeping of other attributes at the moment.

我正在使用这样的东西来清理标签中的 style='...' 部分，同时保留其他属性。

$output = preg_replace('/<([^>]+)(\sstyle=(?P<stq>["\'])(.*)\k<stq>)([^<]*)>/iUs', '<>', $input);

Answer 8

回答by Zmael

$html = preg_replace('/\sstyle=("|\').*?("|\')/i', '', $html);

For replacing all style="" with blank.

用于将所有 style="" 替换为空白。

Answer 9

回答by Daniel

You could handle it client side, the easiest would be with jQuery. Something like:

您可以在客户端处理它，最简单的方法是使用 jQuery。就像是：

$("#tinyMce p").removeAttr("style");

php 从 HTML 标签中删除样式属性

提问by Martin Bean

回答by Staffan N?teberg

回答by Maerlyn

回答by JaseC

回答by DreschF

回答by Lorenzo Marcon

回答by RafaSashi

回答by Ashguard

回答by Zmael

回答by Daniel

相关推荐

最近更新

标签

php 从 HTML 标签中删除样式属性

提问by Martin Bean

回答by Staffan N?teberg

回答by Maerlyn

回答by JaseC

回答by DreschF

回答by Lorenzo Marcon

回答by RafaSashi

回答by Ashguard

回答by Zmael

回答by Daniel

相关推荐

php UTC 是否遵守夏令时？

php 仅更新 Cakephp 3 上的一个字段

如何使用 AJAX (jQuery) 下载从 TCPDF (PHP) 生成的 PDF 文件？

PHP 文件大小 MB/KB 转换

相关推荐

最近更新

标签