为什么表情符号有两种不同的 utf-8 代码？如何从 utf-8 转换表情符号，在 ios 中使用 NSString？

Question

提问by pinchwang

We have found an issue, that some emoji have two utf-8 codes, such as:

我们发现了一个问题，一些表情符号有两个 utf-8 代码，例如：

emoji   unicode    utf-8                another utf-8
      U+1F601    \xf0\x9f\x98\x81     \xed\xa0\xbd\xed\xb8\x81

But ios language can't decode the other type of utf-8, so resulting an error when i decode string from utf-8.

但是ios语言无法解码其他类型的utf-8，所以当我从utf-8解码字符串时会出现错误。

In all documents i found, i can just find one type of utf-8 code for a emoji, no where to find the other.

在我找到的所有文档中，我只能找到一种表情符号的 utf-8 代码，而找不到另一种。

Documents i referenced includes:

我参考的文件包括：

emoji code link

表情符号代码链接

whole utf-8 code link

整个 utf-8 代码链接

But in a web tool bianma, all the two types of utf-8 code can be converted into emoji correctly.

但是在一个网络工具bianma 中，两种类型的 utf-8 代码都可以正确转换为 emoji。

So, my question is :

所以，我的问题是：

Why does there have two types of utf-8 codes for one emoji ?
Where has a document which includes the two types of utf-8 codes?
How to correctly convert string from utf-8, using NSString in ios language?

为什么一个表情符号有两种类型的 utf-8 代码？
哪里有包含两种 utf-8 代码的文档？
如何在ios语言中使用NSString从utf-8正确转换字符串？

Answer 1

采纳答案by bobince

0xF0, 0x9F, 0x98, 0x81

0xF0、0x9F、0x98、0x81

Is the correct UTF-8 encoding for U+1F601 .

是 U+1F601 的正确 UTF-8 编码。

0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81

0xED、0xA0、0xBD、0xED、0xB8、0x81

Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.

不是有效的 UTF-8 序列 (*)。真的应该拒绝；iOS 这样做是正确的。

This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePointsfunction is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.

这是 bianma 工具中的一个错误：该convertUtf8BytesToUnicodeCodePoints函数对于它接受的输入比RFC 3629 中的指定算法更宽松。

This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving \uD83D\xDE01. As this is the correct way to encode in a UTF-16 string it appears to have worked.

这恰好返回一个工作字符串，因为该工具是用 JavaScript 编写的。将上述字节序列解码为伪代理代码点序列 U+D83D,U+DE01 后，它然后使用直接代码点到代码单元映射将其转换为 JavaScript 字符串\uD83D\xDE01。由于这是在 UTF-16 字符串中编码的正确方法，因此它似乎有效。

(*: It isa valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)

（*：这是一个有效的 CESU-8 序列，但该编码只是“为了与写得不好的历史工具兼容的虚假损坏编码”，通常应该避免。）

You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.

您通常不会遇到这样的序列；通常不值得考虑，除非您有这种格式错误的数据的特定来源，而您无权修复。

Answer 2

回答by Polina

This worked for me in php to send a message with emoji to telegram bot:

这在 php 中对我有用，可以向电报机器人发送带有表情符号的消息：

$message_text = " \xf0\x9f\x98\x81 ";

为什么表情符号有两种不同的 utf-8 代码？如何从 utf-8 转换表情符号，在 ios 中使用 NSString？

提问by pinchwang

采纳答案by bobince

回答by Polina

相关推荐

最近更新

标签

为什么表情符号有两种不同的 utf-8 代码？如何从 utf-8 转换表情符号，在 ios 中使用 NSString？

提问by pinchwang

采纳答案by bobince

回答by Polina

相关推荐

ios 如何在 Swift 中以编程方式创建“返回”UIBarButton 项？

ios 在 Swift 中执行 while 循环

如何在 IOS 9 中的 UIAlertController 中添加按钮

如何在自动布局 iOS 中动态更改字体大小？

相关推荐

最近更新

标签