php RSS 提要中的 utf-8 和 htmlentities

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/307623/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 22:19:24  来源:igfitidea点击:

utf-8 and htmlentities in RSS feeds

phputf-8rss

提问by Doug Kaye

I'm writing some RSS feeds in PHP and stuggling with character-encoding issues. Should I utf8_encode() before or after htmlentities() encoding? For example, I've got both ampersands and Chinese characters in a description element, and I'm not sure which of these is proper:

我正在用 PHP 编写一些 RSS 提要,并且正在处理字符编码问题。我应该在 htmlentities() 编码之前还是之后使用 utf8_encode()?例如,我在 description 元素中有 & 号和中文字符,但我不确定其中哪一个是正确的:

$output = utf8_encode(htmlentities($source)); or
$output = htmlentities(utf8_encode($source));

And why?

为什么?

回答by Eran Galperin

It's important to pass the character set to the htmlentities function, as the default is ISO-8859-1:

将字符集传递给 htmlentities 函数很重要,因为默认值为 ISO-8859-1:

utf8_encode(htmlentities($source,ENT_COMPAT,'utf-8'));

You should apply htmlentities first as to allow utf8_encode to encode the entities properly.

您应该首先应用 htmlentities 以允许 utf8_encode 正确编码实体。

(EDIT: I changed from my opinion before that the order didn't matter based on the comments. This code is tested and works well).

(编辑:我之前改变了我的观点,根据评论顺序无关紧要。此代码经过测试并且运行良好)。

回答by Gumbo

First: The utf8_encodefunctionconverts from ISO 8859-1 to UTF-8. So you only need this function, if your input encoding/charset is ISO 8859-1. But why don't you use UTF-8 in the first place?

第一:utf8_encode函数从 ISO 8859-1 转换为 UTF-8。所以你只需要这个函数,如果你的输入编码/字符集是 ISO 8859-1。但是你为什么不首先使用 UTF-8 呢?

Second: You don't need htmlentities. You just need htmlspecialcharsto replace the special characters by character references. htmlentitieswould replace “too much” characters that can be encoded directly using UTF-8. Important is that you use the ENT_QUOTESquote style to replace the single quotes as well.

第二:你不需要htmlentities. 您只需要用htmlspecialchars字符引用替换特殊字符。htmlentities将替换可以使用 UTF-8 直接编码的“太多”字符。重要的是您也使用ENT_QUOTES引号样式来替换单引号。

So my proposal:

所以我的建议是:

// if your input encoding is ISO 8859-1
htmlspecialchars(utf8_encode($string), ENT_QUOTES)

// if your input encoding is UTF-8
htmlspecialchars($string, ENT_QUOTES, 'UTF-8')

回答by Kornel

Don't use htmlentities()!

不要用htmlentities()

Simply use UTF-8 characters. Just make sure you declare encoding of the feed in HTTP headers (Content-Type:application/xml;charset=UTF-8) or failing that, in the feed itself using <?xml version="1.0" encoding="UTF-8"?>on the first line.

只需使用 UTF-8 字符。只需确保您在 HTTP 标头 ( Content-Type:application/xml;charset=UTF-8) 中声明提要的编码,否则在提要本身中使用<?xml version="1.0" encoding="UTF-8"?>第一行。

回答by Kornel

It might be easier to forget htmlentities and use a CDATA section. It works for the title section, which doesn't seem support encoded HTML characters in Firefox's RSS viewer:

忘记 htmlentities 并使用 CDATA 部分可能更容易。它适用于标题部分,它似乎不支持 Firefox 的 RSS 查看器中的编码 HTML 字符:

<title><![CDATA[News & Updates  " > ? ? ? ? ?  Test!]]></title>

回答by SoapBox

You want to do $output = htmlentities(utf8_encode($source));. This is because you want to convert your international characters into proper UTF8 first, and then have ampersands (and possibly some of the UTF-8 characters as well) turned in to HTML entities. If you do the entities first, then some of the international characters may not be handled properly.

你想做什么$output = htmlentities(utf8_encode($source));。这是因为您想先将国际字符转换为正确的 UTF8,然后将与号(可能还有一些 UTF-8 字符)转换为 HTML 实体。如果先做实体,那么有些国际字可能处理不好。

If none of your international characters are going to be changed by utf8_encode, then it doesn't matter which order you call them in.

如果您的任何国际字符都不会被 utf8_encode 更改,那么您调用它们的顺序无关紧要。

回答by katy lavallee

After much trial & error, I finally found a way to properly display a string from a utf8-encoded database value, through an xml file, to an html page:

经过多次反复试验,我终于找到了一种方法,可以将 utf8 编码的数据库值中的字符串通过 xml 文件正确显示到 html 页面:

$output = '<![CDATA['.utf8_encode(htmlentities($string)).']]>';

I hope this helps someone.

我希望这可以帮助别人。