打印 Unicode 字符 PHP

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17539412/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 16:03:40  来源:igfitidea点击:

Print Unicode characters PHP

phpunicodehtml-escape-characters

提问by Cameron Tinker

I have a database which stores video game names with Unicode characters but I can't figure out how to properly escape these Unicode characters when printing them to an HTML response.

我有一个数据库,用于存储带有 Unicode 字符的视频游戏名称,但是在将它们打印到 HTML 响应时,我无法弄清楚如何正确转义这些 Unicode 字符。

For instance, when I print all games with the name like Uncharted, I get this:

例如,当我打印名称为 Uncharted 的所有游戏时,我得到以下信息:

Uncharted: Drake's Fortunea?¢
Uncharted 2: Among Thievesa?¢
Uncharted 3: Drake's Deceptiona?¢

but it should display this:

但它应该显示:

Uncharted: Drake's Fortune?
Uncharted 2: Among Thieves?
Uncharted 3: Drake's Deception?

I ran a quick JavaScript escape function to see which Unicode character the ?is and found that it's \u2122.

我运行了一个快速的 JavaScript 转义函数来查看是哪个 Unicode 字符,?并发现它是\u2122.

I don't have a problem fully escaping every character in the string if I can get the ?character to display correctly. My guess is to somehow find the hex representation of each character in the string and have PHP render the Unicode characters like this:

如果可以?正确显示字符,则完全转义字符串中的每个字符都没有问题。我的猜测是以某种方式找到字符串中每个字符的十六进制表示,并让 PHP 像这样呈现 Unicode 字符:

print "&#x2122";

Please guide me through the best approach for Unicode escaping a string for being HTML friendly. I've done something similar for JavaScript a while back, but JavaScript has a built in function for escape and unescape.

请指导我完成 Unicode 转义字符串以实现 HTML 友好的最佳方法。不久前,我为 JavaScript 做过类似的事情,但 JavaScript 有一个用于转义和转义的内置函数。

I'm not aware of any PHP functions of similar functionality however. I have read about the ordfunction, but it just returns the ASCII character code for a given character, hence the improper display of the ™or the ™. I would like this function to be versatile enough to apply to any string containing valid Unicode characters.

但是,我不知道有任何类似功能的 PHP 函数。我已阅读有关ORD功能,但它只是返回给定字符的ASCII字符代码,因此的显示不正确™™。我希望这个函数足够通用,可以应用于任何包含有效 Unicode 字符的字符串。

回答by Alex Shesterov

It looks like you have UTF-8 encoded strings internally, PHP outputs them properly, but your browser fails to auto-detect the encoding (it decides for ISO 8859-1 or some other encoding).

看起来您在内部有 UTF-8 编码的字符串,PHP 正确输出它们,但是您的浏览器无法自动检测编码(它决定使用 ISO 8859-1 或其他一些编码)。

The best way is to tell the browser that UTF-8 is being usedby sending the corresponding HTTP header:

最好的方法是通过发送相应的 HTTP 标头来告诉浏览器正在使用 UTF-8

header("content-type: text/html; charset=UTF-8");  

Then, you can leave the rest of your code as-is and don't have to html-encode entities or create other mess.

然后,您可以保留其余代码原样,而不必对实体进行 html 编码或创建其他混乱。

If you want, you can additionallydeclare the encoding in the generated HTML by using the <meta>tag:

如果你愿意,你可以额外使用声明中生成的HTML编码<meta>标签:

  • <meta http-equiv=Content-Type content="text/html; charset=UTF-8">for HTML <=4.01
  • <meta charset="UTF-8">for HTML5
  • <meta http-equiv=Content-Type content="text/html; charset=UTF-8">对于 HTML <=4.01
  • <meta charset="UTF-8">为 HTML5

HTTP header has priority over the <meta>tag, but the latter may be useful if the HTML is saved to HD and then read locally.

HTTP 标头优先于<meta>标签,但如果将 HTML 保存到高清,然后在本地读取,后者可能会很有用。

回答by sh4

I spent a lot of time trying to find the better way to just print the equivalent char of an unicode code, and the methods I found didn't work or it just were very complicated.

我花了很多时间试图找到更好的方法来打印 unicode 代码的等效字符,但我发现的方法不起作用或者它只是非常复杂。

This said, JSON is able to represent unicode characters using the syntax "\u[unicode_code]", then:

这就是说,JSON 能够使用语法“\u[unicode_code]”来表示 unicode 字符,然后:

echo json_decode('"\u00e1"'); 

Will print the equivalent unicode char, in this case: á.

将打印等效的 unicode 字符,在本例中为:á。

P.D. Note the simple and the double quotes. If you don't put both it won't work.

PD 注意简单和双引号。如果你不把两者都放在里面,它就不会起作用。

回答by CXJ

Try this:

尝试这个:

echo htmlentities("Uncharted: Drakes Fortune? \n", ENT_QUOTES, "UTF-8");

From: http://php.net/htmlentities

来自:http: //php.net/htmlentities

回答by masakielastic

// PHP 7.0
var_dump(
    IntlChar::chr(0x2122),
    IntlChar::chr(0x1F638)
);

var_dump(
    utf8_chr(0x2122),
    utf8_chr(0x1F638)
);

function utf8_chr($cp) {

    if (!is_int($cp)) {
        exit("$cp is not integer\n");
    }

    // UTF-8 prohibits characters between U+D800 and U+DFFF
    // https://tools.ietf.org/html/rfc3629#section-3
    //
    // Q: Are there any 16-bit values that are invalid?
    // http://unicode.org/faq/utf_bom.html#utf16-7

    if ($cp < 0 || (0xD7FF < $cp && $cp < 0xE000) || 0x10FFFF < $cp) {
        exit("$cp is out of range\n");
    }

    if ($cp < 0x10000) {
        return json_decode('"\u'.bin2hex(pack('n', $cp)).'"');
    }

    // Q: Isn't there a simpler way to do this?
    // http://unicode.org/faq/utf_bom.html#utf16-4
    $lead = 0xD800 - (0x10000 >> 10) + ($cp >> 10);
    $trail = 0xDC00 + ($cp & 0x3FF);

    return json_decode('"\u'.bin2hex(pack('n', $lead)).'\u'.bin2hex(pack('n', $trail)).'"');
}