PHP 中的 htmlentities 但保留 html 标签

Question

提问by fidoboy

I want to convert all texts in a string into html entities but preserving the HTML tags, for example this:

我想将字符串中的所有文本转换为 html 实体但保留 HTML 标签，例如：

<p><font style="color:#FF0000">Camión espa?ol</font></p>

should be translated into this:

应该翻译成这样：

<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>

any ideas?

有任何想法吗？

Answer 1

回答by Pascal MARTIN

You can get the list of correspondances character => entity used by htmlentities, with the function get_html_translation_table; consider this code :

您可以使用htmlentities函数get_html_translation_table;获取对应字符列表 => 使用的实体考虑这个代码：

$list = get_html_translation_table(HTML_ENTITIES);
var_dump($list);

(You might want to check the second parameter to that function in the manual -- maybe you'll need to set it to a value different than the default one)

（您可能想在手册中检查该函数的第二个参数——也许您需要将其设置为与默认值不同的值）

It will get you something like this :

它会给你这样的东西：

array
  ' ' => string '&nbsp;' (length=6)
  '?' => string '&iexcl;' (length=7)
  '￠' => string '&cent;' (length=6)
  '￡' => string '&pound;' (length=7)
  '¤' => string '&curren;' (length=8)
  ....
  ....
  ....
  '?' => string '&yuml;' (length=6)
  '"' => string '&quot;' (length=6)
  '<' => string '&lt;' (length=4)
  '>' => string '&gt;' (length=4)
  '&' => string '&amp;' (length=5)

Now, remove the correspondances you don't want :

现在，删除您不想要的对应关系：

unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

Your list, now, has all the correspondances character => entity used by htmlentites, except the few characters you don't want to encode.

现在，您的列表包含 htmlentites 使用的所有对应字符 => 实体，除了您不想编码的少数字符。

And now, you just have to extract the list of keys and values :

现在，您只需要提取键和值列表：

$search = array_keys($list);
$values = array_values($list);

And, finally, you can use str_replace to do the replacement :

最后，您可以使用 str_replace 进行替换：

$str_in = '<p><font style="color:#FF0000">Camión espa?ol</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_out);

And you get :

你得到：

string '<p><font style="color:#FF0000">Cami&Atilde;&sup3;n espa&Atilde;&plusmn;ol</font></p>' (length=84)

Which looks like what you wanted ;-)

这看起来像你想要的 ;-)

Edit : well, except for the encoding problem (damn UTF-8, I suppose -- I'm trying to find a solution for that, and will edit again)

编辑：嗯，除了编码问题（该死的 UTF-8，我想 - 我正在尝试为此找到解决方案，并将再次编辑）

Second edit couple of minutes after : it seem you'll have to use utf8_encodeon the $searchlist, before calling str_replace:-(

几分钟后进行第二次编辑：在致电之前，您似乎必须utf8_encode在$search列表中使用str_replace:-(

Which means using something like this :

这意味着使用这样的东西：

$search = array_map('utf8_encode', $search);

Between the call to array_keysand the call to str_replace.

在调用array_keys和调用之间str_replace。

And, this time, you should really get what you wanted :

而且，这一次，你真的应该得到你想要的：

string '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>' (length=70)

And here is the full portion of code :

这是代码的完整部分：

$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

$search = array_keys($list);
$values = array_values($list);
$search = array_map('utf8_encode', $search);

$str_in = '<p><font style="color:#FF0000">Camión espa?ol</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_in, $str_out);

And the full output :

和完整的输出：

string '<p><font style="color:#FF0000">Camión espa?ol</font></p>' (length=58)
string '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>' (length=70)

This time, it should be ok ^^
It doesn't really fit in one line, is might not be the most optimized solution ; but it should work fine, and has the advantage of allowing you to add/remove any correspondance character => entity you need or not.

这次应该
没问题了^^ 真的不适合一行，可能不是最优化的方案；但它应该可以正常工作，并且具有允许您添加/删除任何您需要或不需要的对应字符 => 实体的优点。

Have fun !

玩得开心！

Answer 2

回答by Peter Bailey

Might not be terribly efficient, but it works

可能不是非常有效，但它有效

$sample = '<p><font style="color:#FF0000">Camión espa?ol</font></p>';

echo htmlspecialchars_decode(
    htmlentities($sample, ENT_NOQUOTES, 'UTF-8', false)
  , ENT_NOQUOTES
);

Answer 3

回答by SileNT

This is optimized version of the accepted answer.

这是已接受答案的优化版本。

$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

$string = strtr($string, $list);

Answer 4

回答by ndp

No solution short of a parser is going to be correct for all cases. Yours is a good case:

没有解析器的解决方案在所有情况下都是正确的。你的情况很好：

<p><font style="color:#FF0000">Camión espa?ol</font></p>

but do you also want to support:

但你是否也想支持：

<p><font>true if 5 < a && name == "joe"</font></p>

where you want it to come out as:

你希望它出现的地方：

<p><font>true if 5 &lt; a &amp;&amp; name == &quot;joe&quot;</font></p>

Question: Can you do the encoding BEFORE you build the HTML. In other words can do something like:

问题：您能在构建 HTML 之前进行编码吗？换句话说，可以执行以下操作：

"<p><font>" + htmlentities(inner) + "</font></p>"

You'll save yourself lots of grief if you can do that. If you can't, you'll need some way to skip encoding <, >, and " (as described above), or simply encode it all, and then undo it (eg. replace('<', '<'))

如果你能做到这一点，你会为自己省去很多悲伤。如果不能，则需要某种方法来跳过编码 <、> 和 "（如上所述），或者简单地将其全部编码，然后撤消它（例如。replace('<', '<')）

Answer 5

回答by bflesch

This is a function I've just written which solves this problem in a very elegant way:

这是我刚刚编写的一个函数，它以一种非常优雅的方式解决了这个问题：

First of all, the HTML tags will be extracted from the string, then htmlentities() is executed on every remaining substring and after that the original HTML tags will be inserted at their old position thus resulting in no alternation of the HTML tags. :-)

首先，将从字符串中提取 HTML 标签，然后对每个剩余的子字符串执行 htmlentities()，然后将原始 HTML 标签插入到它们的旧位置，从而不会更改 HTML 标签。:-)

Have fun:

玩得开心：

function htmlentitiesOutsideHTMLTags ($htmlText)
{
    $matches = Array();
    $sep = '###HTMLTAG###';

    preg_match_all("@<[^>]*>@", $htmlText, $matches);   
    $tmp = preg_replace("@(<[^>]*>)@", $sep, $htmlText);
    $tmp = explode($sep, $tmp);

    for ($i=0; $i<count($tmp); $i++)
        $tmp[$i] = htmlentities($tmp[$i]);

    $tmp = join($sep, $tmp);

    for ($i=0; $i<count($matches[0]); $i++)
        $tmp = preg_replace("@$sep@", $matches[0][$i], $tmp, 1);

    return $tmp;
}

Answer 6

回答by Luca Borrione

Based on the answer of bflesch, I did some changes to manage string containing less than sign, greater than signand single quoteor double quotes.

根据bflesch的回答，我做了一些更改来管理包含less than sign,greater than sign和single quote或的字符串double quotes。

function htmlentitiesOutsideHTMLTags ($htmlText, $ent)
{
    $matches = Array();
    $sep = '###HTMLTAG###';

    preg_match_all(":</{0,1}[a-z]+[^>]*>:i", $htmlText, $matches);

    $tmp = preg_replace(":</{0,1}[a-z]+[^>]*>:i", $sep, $htmlText);
    $tmp = explode($sep, $tmp);

    for ($i=0; $i<count($tmp); $i++)
        $tmp[$i] = htmlentities($tmp[$i], $ent, 'UTF-8', false);

    $tmp = join($sep, $tmp);

    for ($i=0; $i<count($matches[0]); $i++)
        $tmp = preg_replace(":$sep:", $matches[0][$i], $tmp, 1);

    return $tmp;
}

Example of use:

使用示例：

$string = '<b>Is 1 < 4?</b>è<br><i>"then"</i> <div style="some:style;"><p>gain some <strong></strong><img src="/some/path" /></p></div>';
$string_entities = htmlentitiesOutsideHTMLTags($string, ENT_QUOTES | ENT_HTML401);
var_dump( $string_entities );

Output is:

输出是：

string '<b>Is 1 &lt; 4?</b>&egrave;<br><i>&quot;then&quot;</i> <div style="some:style;"><p>gain some <strong>&euro;</strong><img src="/some/path" /></p></div>' (length=150)

You can pass any ent flagaccording to the htmlentities manual

您可以ent flag根据htmlentities 手册传递任何内容

Answer 7

回答by aequalsb

one-line solution with NO translation table or custom function required:

无需转换表或自定义功能的单行解决方案：

i know this is an old question, but i recently had to import a static site into a wordpress site and had to overcome this issue:

我知道这是一个老问题，但我最近不得不将一个静态站点导入到 wordpress 站点中，并且不得不克服这个问题：

here is my solution that does not require fiddling with translation tables:

这是我的解决方案，不需要摆弄翻译表：

htmlspecialchars_decode( htmlentities( html_entity_decode( $string ) ) );

when applied to the OP's string:

当应用于 OP 的字符串时：

<p><font style="color:#FF0000">Camión espa?ol</font></p>

output:

输出：

<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>

when applied to Luca's string:

当应用于 Luca 的字符串时：

<b>Is 1 < 4?</b>è<br><i>"then"</i> <div style="some:style;"><p>gain some <strong></strong><img src="/some/path" /></p></div>

output:

输出：

<b>Is 1 < 4?</b>&egrave;<br><i>"then"</i> <div style="some:style;"><p>gain some <strong>&euro;</strong><img src="/some/path" /></p></div>

PHP 中的 htmlentities 但保留 html 标签

提问by fidoboy

回答by Pascal MARTIN

回答by Peter Bailey

回答by SileNT

回答by ndp

回答by bflesch

回答by Luca Borrione

回答by aequalsb

相关推荐

最近更新

标签

PHP 中的 htmlentities 但保留 html 标签

提问by fidoboy

回答by Pascal MARTIN

回答by Peter Bailey

回答by SileNT

回答by ndp

回答by bflesch

回答by Luca Borrione

回答by aequalsb

相关推荐

PHP：函数中的 $_GET 和 $_POST？

php 每 4 个字符后添加空格

php 警告：mail() [function.mail]：无法在“localhost”端口 25 连接到邮件服务器，请验证您的“SMTP”和“smtp_port”，Windows XP 的 XAMPP

PHP 致命错误：找不到“PDO”类

相关推荐

最近更新

标签