php UTF-8 到 Unicode 代码点

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7106470/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-26 02:01:19  来源:igfitidea点击:

UTF-8 to Unicode Code Points

phpunicodeutf-8

提问by Adrien Hingert

Is there a function that will change UTF-8 to Unicode leaving non special characters as normal letters and numbers?

是否有一个函数可以将 UTF-8 更改为 Unicode,而将非特殊字符保留为普通字母和数字?

ie the German word "tchü?" would be rendered as something like "tch\20AC\21AC" (please note that I am making the Unicode codes up).

即德语单词“tchü?” 将被呈现为类似“tch\20AC\21AC”(请注意,我正在制作 Unicode 代码)。

EDIT: I am experimenting with the following function, but although this one works well with ASCII 32-127, it seems to fail for double byte chars:

编辑:我正在试验以下函数,但尽管这个函数在 ASCII 32-127 上运行良好,但对于双字节字符似乎失败了:

function strToHex ($string)
{
    $hex = '';
    for ($i = 0; $i < mb_strlen ($string, "utf-8"); $i++)
    {
        $id = ord (mb_substr ($string, $i, 1, "utf-8"));
        $hex .= ($id <= 128) ? mb_substr ($string, $i, 1, "utf-8") : "&#" . $id . ";";
}

    return ($hex);
}

Any ideas?

有任何想法吗?

EDIT 2: Found solution: The PHP ord() function does not work for double byte chars. Use instead: http://nl.php.net/manual/en/function.ord.php#78032

编辑 2:找到解决方案:PHP ord() 函数不适用于双字节字符。改用:http: //nl.php.net/manual/en/function.ord.php#78032

采纳答案by Luwe

Converting one character set to another can be done with iconv:

可以使用 iconv 将一个字符集转换为另一个字符集:

http://php.net/manual/en/function.iconv.php

http://php.net/manual/en/function.iconv.php

Note that UTF is already an Unicode encoding.

请注意,UTF 已经是一种 Unicode 编码。

Another way is simply using htmlentities with the right character set:

另一种方法是简单地使用具有正确字符集的 htmlentities:

http://php.net/manual/en/function.htmlentities.php

http://php.net/manual/en/function.htmlentities.php

回答by bobince

For a readable-form I would go with JSON. It's not required to escape non-ASCII characters in JSON, but PHP does:

对于可读形式,我会使用 JSON。不需要转义 JSON 中的非 ASCII 字符,但 PHP 可以:

echo json_encode("tchü?");

"tch\u00fc\u00df"

回答by bobince

For people looking to find the Unicode Code Point for any character this might be useful. You can then encode the string in whatever you want, replacing certain characters with escape codes, and leaving others in their binary form (eg. ascii printable characters), depending on the context in which you want to use it.

对于希望找到任何字符的 Unicode 代码点的人来说,这可能很有用。然后,您可以按照您想要的任何内容对字符串进行编码,用转义码替换某些字符,并将其他字符保留为二进制形式(例如 ascii 可打印字符),具体取决于您要使用它的上下文。

From: Mapping codepoints to Unicode encoding forms

来自:将代码点映射到 Unicode 编码形式

The mapping for UTF-32 is, essentially, the identity mapping: the 32-bit code unit used to encode a codepoint has the same integer value as the codepoint itself.

UTF-32 的映射本质上是身份映射:用于编码代码点的 32 位代码单元与代码点本身具有相同的整数值。

/**
 * Convert a string into an array of decimal Unicode code points.
 *
 * @param $string   [string] The string to convert to codepoints
 * @param $encoding [string] The encoding of $string
 * 
 * @return [array] Array of decimal codepoints for every character of $string
 */
function toCodePoint( $string, $encoding )
{
    $utf32  = mb_convert_encoding( $string, 'UTF-32', $encoding );
    $length = mb_strlen( $utf32, 'UTF-32' );
    $result = [];


    for( $i = 0; $i < $length; ++$i )

        $result[] = hexdec( bin2hex( mb_substr( $utf32, $i, 1, 'UTF-32' ) ) );


    return $result;
}

回答by Fran?ois

With PHP 7, there is a new IntlChar::ord()to find the Unicode Code Point from a given UTF-8 character:

在 PHP 7 中,有一个新的IntlChar::ord()可以从给定的 UTF-8 字符中查找 Unicode 代码点:

var_dump(sprintf('U+%04X', IntlChar::ord('?')));

# Outputs: string(6) "U+00DF"

回答by skywise

I guess you're going to print out your strings on a website?

我猜你要在网站上打印出你的字符串?

I'm storing all my databases in uft8, using html_entities($string) before output.

我将所有数据库存储在 uft8 中,在输出之前使用 html_entities($string)。

Maybe you have to try html_entities(utf8_encode($string));

也许你必须尝试 html_entities(utf8_encode($string));

回答by powtac

I once created a function called _convert()which encodes safely everything to UTF-8.

我曾经创建了一个名为_convert()的函数,它将所有内容安全地编码为 UTF-8。

回答by Garlaro

Tested on php 5.6

在 php 5.6 上测试

/**
 * @param string $utf8char
 * @return string
 */
function toUnicodeCodePoint($utf8char)
{
    return 'U+' . dechex(mb_ord($utf8char));
}

/**
 * @see https://github.com/symfony/polyfill-mbstring
 * @param string $s
 * @return int
 */
function mb_ord($s)
{
    $code = ($s = unpack('C*', substr($s, 0, 4))) ? $s[1] : 0;
    if (0xF0 <= $code) {
        return (($code - 0xF0) << 18) + (($s[2] - 0x80) << 12) + (($s[3] - 0x80) << 6) + $s[4] - 0x80;
    }
    if (0xE0 <= $code) {
        return (($code - 0xE0) << 12) + (($s[2] - 0x80) << 6) + $s[3] - 0x80;
    }
    if (0xC0 <= $code) {
        return (($code - 0xC0) << 6) + $s[2] - 0x80;
    }

    return $code;
}

echo toUnicodeCodePoint('');
// U+1f613

回答by user989840

I had a problem when i need to convert string (utf-8 in default) with cyrilic to entities partly - only cyrilic. Finaly i need to get JSON-like result, like this:

当我需要将带有 cyrilic 的字符串(默认为 utf-8)部分转换为实体时遇到了问题 - 只有 cyrilic。最后我需要得到类似 JSON 的结果,如下所示:

<li class="my_class">City - Mocsow (Москва)</li>

to this:

对此:

<li class=\"my_class\">City - Mocsow (\u041c\u043e\u0441\u043a\u0432\u0430)<\/li>

So, i`ve got a compex (mix of subj. author and Nus) solution:

所以,我有一个复杂的(主题作者和 Nus 的混合)解决方案:

function strToHex($string){
    $enc="utf-8";
    $hex = '';
    for ($i = 0; $i < mb_strlen ($string, $enc); $i++){
        $id = ord (mb_substr ($string, $i, 1, $enc));
        $hex .= ($id <= 128) ? mb_substr ($string, $i, 1, $enc) : toCodePoint(mb_substr ($string, $i, 1, $enc), $enc);
    }
    return $hex;
}
function toCodePoint($string, $encoding){
    $utf32  = mb_convert_encoding( $string, 'UTF-32', $encoding );
    $length = mb_strlen( $utf32, 'UTF-32' );
    $result = Array();
    for( $i = 0; $i < $length; ++$i )$result[] = "\u".substr(bin2hex( mb_substr( $utf32, $i, 1, 'UTF-32' ) ), 4,8);
    return implode("", $result);
}
$output=strToHex(
    str_replace( // this is for json compatible
        array("\"", "\n", "\r", "\t", "/"),
        array('\"', '\n', "", " ", "\/"),
        $text
    )
);
echo $output;

It tested on php 5.2.17 :)

它在 php 5.2.17 上测试过:)