打印 Unicode 字符 PHP
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17539412/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Print Unicode characters PHP
提问by Cameron Tinker
I have a database which stores video game names with Unicode characters but I can't figure out how to properly escape these Unicode characters when printing them to an HTML response.
我有一个数据库,用于存储带有 Unicode 字符的视频游戏名称,但是在将它们打印到 HTML 响应时,我无法弄清楚如何正确转义这些 Unicode 字符。
For instance, when I print all games with the name like Uncharted, I get this:
例如,当我打印名称为 Uncharted 的所有游戏时,我得到以下信息:
Uncharted: Drake's Fortunea?¢
Uncharted 2: Among Thievesa?¢
Uncharted 3: Drake's Deceptiona?¢
but it should display this:
但它应该显示:
Uncharted: Drake's Fortune?
Uncharted 2: Among Thieves?
Uncharted 3: Drake's Deception?
I ran a quick JavaScript escape function to see which Unicode character the ?
is and found that it's \u2122
.
我运行了一个快速的 JavaScript 转义函数来查看是哪个 Unicode 字符,?
并发现它是\u2122
.
I don't have a problem fully escaping every character in the string if I can get the ?
character to display correctly. My guess is to somehow find the hex representation of each character in the string and have PHP render the Unicode characters like this:
如果可以?
正确显示字符,则完全转义字符串中的每个字符都没有问题。我的猜测是以某种方式找到字符串中每个字符的十六进制表示,并让 PHP 像这样呈现 Unicode 字符:
print "™";
Please guide me through the best approach for Unicode escaping a string for being HTML friendly. I've done something similar for JavaScript a while back, but JavaScript has a built in function for escape and unescape.
请指导我完成 Unicode 转义字符串以实现 HTML 友好的最佳方法。不久前,我为 JavaScript 做过类似的事情,但 JavaScript 有一个用于转义和转义的内置函数。
I'm not aware of any PHP functions of similar functionality however. I have read about the ordfunction, but it just returns the ASCII character code for a given character, hence the improper display of the ™
or the ™
. I would like this function to be versatile enough to apply to any string containing valid Unicode characters.
但是,我不知道有任何类似功能的 PHP 函数。我已阅读有关ORD功能,但它只是返回给定字符的ASCII字符代码,因此的显示不正确™
或™
。我希望这个函数足够通用,可以应用于任何包含有效 Unicode 字符的字符串。
回答by Alex Shesterov
It looks like you have UTF-8 encoded strings internally, PHP outputs them properly, but your browser fails to auto-detect the encoding (it decides for ISO 8859-1 or some other encoding).
看起来您在内部有 UTF-8 编码的字符串,PHP 正确输出它们,但是您的浏览器无法自动检测编码(它决定使用 ISO 8859-1 或其他一些编码)。
The best way is to tell the browser that UTF-8 is being usedby sending the corresponding HTTP header:
最好的方法是通过发送相应的 HTTP 标头来告诉浏览器正在使用 UTF-8:
header("content-type: text/html; charset=UTF-8");
Then, you can leave the rest of your code as-is and don't have to html-encode entities or create other mess.
然后,您可以保留其余代码原样,而不必对实体进行 html 编码或创建其他混乱。
If you want, you can additionallydeclare the encoding in the generated HTML by using the <meta>
tag:
如果你愿意,你可以额外使用声明中生成的HTML编码<meta>
标签:
<meta http-equiv=Content-Type content="text/html; charset=UTF-8">
for HTML <=4.01<meta charset="UTF-8">
for HTML5
<meta http-equiv=Content-Type content="text/html; charset=UTF-8">
对于 HTML <=4.01<meta charset="UTF-8">
为 HTML5
HTTP header has priority over the <meta>
tag, but the latter may be useful if the HTML is saved to HD and then read locally.
HTTP 标头优先于<meta>
标签,但如果将 HTML 保存到高清,然后在本地读取,后者可能会很有用。
回答by sh4
I spent a lot of time trying to find the better way to just print the equivalent char of an unicode code, and the methods I found didn't work or it just were very complicated.
我花了很多时间试图找到更好的方法来打印 unicode 代码的等效字符,但我发现的方法不起作用或者它只是非常复杂。
This said, JSON is able to represent unicode characters using the syntax "\u[unicode_code]", then:
这就是说,JSON 能够使用语法“\u[unicode_code]”来表示 unicode 字符,然后:
echo json_decode('"\u00e1"');
Will print the equivalent unicode char, in this case: á.
将打印等效的 unicode 字符,在本例中为:á。
P.D. Note the simple and the double quotes. If you don't put both it won't work.
PD 注意简单和双引号。如果你不把两者都放在里面,它就不会起作用。
回答by CXJ
Try this:
尝试这个:
echo htmlentities("Uncharted: Drakes Fortune? \n", ENT_QUOTES, "UTF-8");
回答by masakielastic
// PHP 7.0
var_dump(
IntlChar::chr(0x2122),
IntlChar::chr(0x1F638)
);
var_dump(
utf8_chr(0x2122),
utf8_chr(0x1F638)
);
function utf8_chr($cp) {
if (!is_int($cp)) {
exit("$cp is not integer\n");
}
// UTF-8 prohibits characters between U+D800 and U+DFFF
// https://tools.ietf.org/html/rfc3629#section-3
//
// Q: Are there any 16-bit values that are invalid?
// http://unicode.org/faq/utf_bom.html#utf16-7
if ($cp < 0 || (0xD7FF < $cp && $cp < 0xE000) || 0x10FFFF < $cp) {
exit("$cp is out of range\n");
}
if ($cp < 0x10000) {
return json_decode('"\u'.bin2hex(pack('n', $cp)).'"');
}
// Q: Isn't there a simpler way to do this?
// http://unicode.org/faq/utf_bom.html#utf16-4
$lead = 0xD800 - (0x10000 >> 10) + ($cp >> 10);
$trail = 0xDC00 + ($cp & 0x3FF);
return json_decode('"\u'.bin2hex(pack('n', $lead)).'\u'.bin2hex(pack('n', $trail)).'"');
}