javascript PHP 中 ord 或 charCodeAt() 的 UTF-8 安全等价物

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10333098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-26 09:30:43  来源:igfitidea点击:

UTF-8 safe equivalent of ord or charCodeAt() in PHP

phpjavascriptutf-8character-encoding

提问by Rila

I need to be able to use ord() to get the same value as javascript's charCodeAt() function. The problem is that ord() doesn't support UTF8.

我需要能够使用 ord() 获得与 javascript 的 charCodeAt() 函数相同的值。问题是 ord() 不支持 UTF8。

How can I get ? to translate to 260 in PHP? I've tried some uniord functions out there, but they all report 256 instead of 260.

我怎样才能得到 ?在 PHP 中转换为 260?我已经尝试了一些 uniord 函数,但它们都报告 256 而不是 260。

Thanks a lot for any help!

非常感谢您的帮助!

Regards

问候

采纳答案by hakre

ord()works byte per byte (as most of PHPs standard string functions - if not all). You would need to convert it your own, for example with the help of the multibyte string extension:

ord()按字节工作(作为大多数 PHP 标准字符串函数 - 如果不是全部)。您需要自己转换它,例如在多字节字符串扩展的帮助下:

$utf8Character = '?';
list(, $ord) = unpack('N', mb_convert_encoding($utf8Character, 'UCS-4BE', 'UTF-8'));
echo $ord; # 260

回答by masakielastic

mbstring version:

mbstring 版本:

function utf8_char_code_at($str, $index)
{
    $char = mb_substr($str, $index, 1, 'UTF-8');

    if (mb_check_encoding($char, 'UTF-8')) {
        $ret = mb_convert_encoding($char, 'UTF-32BE', 'UTF-8');
        return hexdec(bin2hex($ret));
    } else {
        return null;
    }
}

using htmlspecialchars and htmlspecialchars_decode for getting one character:

使用 htmlspecialchars 和 htmlspecialchars_decode 获取一个字符:

function utf8_char_code_at($str, $index)
{
    $char = '';
    $str_index = 0;

    $str = utf8_scrub($str);
    $len = strlen($str);

    for ($i = 0; $i < $len; $i += 1) {

        $char .= $str[$i];

        if (utf8_check_encoding($char)) {

            if ($str_index === $index) {
                return utf8_ord($char);
            }

            $char = '';
            $str_index += 1;
        }
    }

    return null;
}

function utf8_scrub($str)
{
    return htmlspecialchars_decode(htmlspecialchars($str, ENT_SUBSTITUTE, 'UTF-8'));
}

function utf8_check_encoding($str)
{
    return $str === utf8_scrub($str);
}

function utf8_ord($char)
{
    $lead = ord($char[0]);

    if ($lead < 0x80) {
        return $lead;
    } else if ($lead < 0xE0) {
        return (($lead & 0x1F) << 6) 
      | (ord($char[1]) & 0x3F);
    } else if ($lead < 0xF0) {
        return (($lead &  0xF) << 12)
     | ((ord($char[1]) & 0x3F) <<  6)
     |  (ord($char[2]) & 0x3F);
    } else {
        return (($lead &  0x7) << 18)
     | ((ord($char[1]) & 0x3F) << 12)
     | ((ord($char[2]) & 0x3F) <<  6)
     |  (ord($char[3]) & 0x3F);
    }
}

PHP extension version:

PHP扩展版本:

#include "ext/standard/html.h"
#include "ext/standard/php_smart_str.h"

const zend_function_entry utf8_string_functions[] = {
    PHP_FE(utf8_char_code_at, NULL)
    PHP_FE_END
};

PHP_FUNCTION(utf8_char_code_at)
{
    char *str;
    int len;
    long index;

    unsigned int code_point;
    long i;
    int status;
    size_t pos = 0, old_pos = 0;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "sl", &str, &len, &index) == FAILURE) {
        return;
    }

    for (i = 0; pos < len; ++i) {
        old_pos = pos;
        code_point = php_next_utf8_char((const unsigned char *) str, (size_t) len, &pos, &status);

        if (i == index) {
            if (status == SUCCESS) {
                RETURN_LONG(code_point);
            } else {
                RETURN_NULL();
            }

        }

    }

    RETURN_NULL();
}

回答by Sudhir Bastakoti

Try:

尝试:


function uniord($c) {
        $h = ord($c{0});
        if ($h <= 0x7F) {
            return $h;
        } else if ($h < 0xC2) {
            return false;
        } else if ($h <= 0xDF) {
            return ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
        } else if ($h <= 0xEF) {
            return ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6
                                     | (ord($c{2}) & 0x3F);
        } else if ($h <= 0xF4) {
            return ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12
                                     | (ord($c{2}) & 0x3F) << 6
                                     | (ord($c{3}) & 0x3F);
        } else {
            return false;
        }
    }
    echo uniord('?');

回答by Aeyoun

This should be the equivalent to JavaScript's charCodeAt()based of @hakre's work but corrected to actually work the same as JavaScript (in every way I could think of to test):

这应该等同于charCodeAt()基于@hakre 工作的JavaScript ,但已更正为实际工作与 JavaScript 相同(在我能想到的各种测试方式中):

function charCodeAt($string, $offset) {
  $string = substr($string, $offset, 1);
  list(, $ret) = unpack('S', mb_convert_encoding($character, 'UTF-16LE'));
  return $ret;
}

回答by Php'Regex

There is one ord_utf8function here : https://stackoverflow.com/a/42600959/7558876

这里有一个ord_utf8函数:https: //stackoverflow.com/a/42600959/7558876

This function looks like this (accept string and return integer)

这个函数看起来像这样(接受字符串并返回整数)

<?php

function ord_utf8($s){
return (int) ($s=unpack('C*',$s[0].$s[1].$s[2].$s[3]))&&$s[1]<(1<<7)?$s[1]:
($s[1]>239&&$s[2]>127&&$s[3]>127&&$s[4]>127?(7&$s[1])<<18|(63&$s[2])<<12|(63&$s[3])<<6|63&$s[4]:
($s[1]>223&&$s[2]>127&&$s[3]>127?(15&$s[1])<<12|(63&$s[2])<<6|63&$s[3]:
($s[1]>193&&$s[2]>127?(31&$s[1])<<6|63&$s[2]:0)));
}

And one fast chr_utf8here : https://stackoverflow.com/a/42510129/7558876

还有一个快速的chr_utf8https: //stackoverflow.com/a/42510129/7558876

This function looks like this (accept integer and return a string)

这个函数看起来像这样(接受整数并返回一个字符串)

<?php

function chr_utf8($n,$f='C*'){
return $n<(1<<7)?chr($n):($n<1<<11?pack($f,192|$n>>6,1<<7|191&$n):
($n<(1<<16)?pack($f,224|$n>>12,1<<7|63&$n>>6,1<<7|63&$n):
($n<(1<<20|1<<16)?pack($f,240|$n>>18,1<<7|63&$n>>12,1<<7|63&$n>>6,1<<7|63&$n):'')));
}

Please check links if you want one example…

如果你想要一个例子,请检查链接......