xcode UTF-8 文字

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5690172/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 20:50:46  来源:igfitidea点击:

xcode UTF-8 literals

objective-cxcodeunicode

提问by the wolf

Suppose I have the MUSICAL SYMBOL G CLEFsymbol: ** ** that I wish to have in a string literal in my Objective-C source file.

假设我有MUSICAL SYMBOL G CLEF我希望在我的 Objective-C 源文件中的字符串文字中包含的符号:** **。

The OS X Character Viewer says that the CLEF is UTF8 F0 9D 84 9Eand Unicode 1D11E(D834+DD1E)in their terms.

OS X 字符查看器说 CLEFUTF8 F0 9D 84 9E和 Unicode1D11E(D834+DD1E)是他们的术语。

After some futzing around, and using the ICU UNICODE DemonstrationPage, I did get the following code to work:

经过一番摸索,并使用ICU UNICODE 演示页面,我确实得到了以下代码:

NSString *uni=@"\U0001d11e";
NSString *uni2=[[NSString alloc] initWithUTF8String:"\xF0\x9D\x84\x9E"];
NSString *uni3=@"";
NSLog(@"unicode: %@ and %@ and %@",uni, uni2, uni3);

My questions:

我的问题:

  1. Is it possible to streamline the way I am doing UTF-8 literals? That seems kludgy to me.
  2. Is the @"\U0001d11epart UTF-32?
  3. Why does cutting and pasting the CLEF from Character Viewer actually work? I thought Objective-C files had to be UTF-8?
  1. 是否可以简化我处理 UTF-8 文字的方式?这对我来说似乎很笨拙。
  2. @"\U0001d11e部分UTF-32?
  3. 为什么从 Character Viewer 剪切和粘贴 CLEF 实际上有效?我认为 Objective-C 文件必须是 UTF-8 吗?

采纳答案by Anomie

  1. I would prefer the way you did it in uni3, but sadly that is not recommended. Failing that, I would prefer the method in unito that in uni2. Another option would be [NSString stringWithFormat:@"%C", 0x1d11e].
  2. It is a "universal character name", introduced in C99 (section 6.4.3) and imported into Objective-C as of OS X 10.5. Technically this doesn't have to give you UTF-8 (it's up to the compiler), but in practice UTF-8 is probably what you'll get.
  3. The encoding of the source code file is probably UTF-8, matching what the runtime expects, so everything happens to work. It's also possible the source file is UTF-16 or UTF-32 and the compiler is doing the Right Thing when compiling it. None the less, Apple does not recommend this.
  1. 我更喜欢你这样做的方式uni3,但遗憾的是,不推荐这样做。如果做不到这一点,我会喜欢的方式uni来在uni2。另一种选择是[NSString stringWithFormat:@"%C", 0x1d11e]
  2. 它是一个“通用字符名称”,在 C99(第 6.4.3 节)中引入并从 OS X 10.5 开始导入到 Objective-C。从技术上讲,这不必为您提供 UTF-8(这取决于编译器),但实际上 UTF-8 可能是您得到的。
  3. 源代码文件的编码可能是 UTF-8,符合运行时的预期,所以一切都会正常工作。源文件也可能是 UTF-16 或 UTF-32,并且编译器在编译它时正在做正确的事情。尽管如此,Apple 不建议这样做。

回答by dawg

Answers to your questions (same order):

回答您的问题(顺序相同):

  1. Why choose? xcode uses C99 in default setup. Refer to the C0X draft specification 6.4.3on Universal Character Names. See below.

  2. More technically, the @"\U0001d11eis the 32 bit Unicode code point for that character in the ISO 10646 character set.

  3. I would not count on this behavior working. You should absolutely, positively, without question have all the characters in your source file be 7 bit ASCII. For string literals, use an encoding or, preferably, a suitable external resource able to handle binary data.

  1. 为什么选择?xcode 在默认设置中使用 C99。请参阅有关通用字符名称的 C0X 草案规范 6.4.3。见下文。

  2. 从技术上讲,它@"\U0001d11eISO 10646 字符集中该字符的 32 位 Unicode 代码点。

  3. 我不会指望这种行为有效。您绝对应该肯定地,毫无疑问地将源文件中的所有字符都设为 7 位 ASCII。对于字符串文字,请使用编码,或者最好使用能够处理二进制数据的合适的外部资源。

Universal Character Names(from the WG14/N1256C0X Draft which CLANG follows fairly well):

通用字符名称(来自WG14/N1256C0X 草案,CLANG 很好地遵循了该草案):

Universal Character Names may be used in identifiers, character constants, and string literalsto designate characters that are not in the basic character set.

The universal character name \Unnnnnnnn designates the character whose eight-digit short identifier (as specified by ISO/IEC 10646) is nnnnnnnn) Similarly, the universal character name \unnnn designates the character whose four-digit short identifier is nnnn (and whose eight-digit short identifier is 0000nnnn).

通用字符名称可用于标识符、字符常量字符串文字,以指定不在基本字符集中的字符。

通用字符名\Unnnnnnnn 指定八位短标识符(由ISO/IEC 10646 指定)为nnnnnnnn 的字符 类似地,通用字符名\unnnn 指定四位短标识符为nnnn(并且其八-digit 短标识符是 0000nnnn)。

Therefor, you can produce your character or string in a natural, mixed way:

因此,您可以以自然、混合的方式生成字符或字符串:

char *utf8CStr = 
   "May all your CLEF's \xF0\x9D\x84\x9E be left like this: \U0001d11e";
NSString *uni4=[[NSString alloc] initWithUTF8String:utf8CStr];

The \Unnnnnnnnform allows you to select any Unicode code point, and this is the same value as "Unicode" field at the bottom left of the Character Viewer. The direct entry of \Unnnnnnnnin the C99 source file is handled appropriately by the compiler. Note that there are only two options: \unnnnwhich is a 256 character offset to the default code page or \Unnnnnnnnwhich is the full 32 bit character of any Unicode code point. You need to pad the left with 0's if you are not using all 4 or all 8 digits or \u or \U.

\Unnnnnnnn表单允许您选择任何 Unicode 代码点,这与字符查看器左下角的“Unicode”字段的值相同。\UnnnnnnnnC99 源文件中的直接入口由编译器适当处理。请注意,只有两个选项:\unnnn哪个是默认代码页的 256 个字符偏移量,或者\Unnnnnnnn哪个是任何 Unicode 代码点的完整 32 位字符。如果不使用全部 4 位或全部 8 位数字或 \u 或 \U,则需要用 0 填充左侧。

The form of \xF0\x9D\x84\x9Ein the same string literal is more interesting. This is inserting the raw UTF-8 encoding of the same character. Once passed to the initWithUTF8Stringmethod, but the literal and the encoded literal end up as encoded UTF-8.

的形式\xF0\x9D\x84\x9E相同的字符串字面中更有趣。这是插入相同字符的原始 UTF-8 编码。一旦传递给initWithUTF8String方法,但文字和编码的文字最终会成为编码的 UTF-8。

It may, arguably, be a violation of 130 of section 5.1.1.2to use raw bytes in this way. Given that a raw UTF-8 string would be encoded similarly, I think you are OK.

可以说,以这种方式使用原始字节可能违反了第 5.1.1.2 节130。鉴于原始 UTF-8 字符串的编码方式类似,我认为您没问题。

回答by Carl Norum

  1. You can write the clef character in your string literal, too:

    NSString *uni2=[[NSString alloc] initWithUTF8String:""];
    
  2. The \U0001d11ematches the unicode code point for the G clef character. The UTF-32 form of a character is the same as its codepoint, so you can think of it as UTF-32 if you want to. Here's a link to the unicode tables for musical symbols.

  3. Your file probably is UTF-8. The G clef is a valid UTF8 character - check out the output from hexdump for your file:

    00  4e 53 53 74 72 69 6e 67  20 2a 75 6e 69 33 3d 40  |NSString *uni3=@|
    10  22 f0 9d 84 9e 22 3b 0a  20 20 4e 53 4c 6f 67 28  |"....";.  NSLog(|
    

    As you can see, the correct UTF-8 representation of that character is in the file right where you'd expect it. It's probably safer to use one of your other methods and try to keep the source file in the ASCII range.

  1. 您也可以在字符串文字中写入谱号:

    NSString *uni2=[[NSString alloc] initWithUTF8String:""];
    
  2. \U0001d11e用于G谱号字符的Unicode代码点相匹配。字符的 UTF-32 形式与其代码点相同,因此您可以根据需要将其视为 UTF-32。这是指向音乐符号 unicode 表的链接。

  3. 您的文件可能是 UTF-8。G 谱号是一个有效的 UTF8 字符 - 查看你的文件的 hexdump 输出:

    00  4e 53 53 74 72 69 6e 67  20 2a 75 6e 69 33 3d 40  |NSString *uni3=@|
    10  22 f0 9d 84 9e 22 3b 0a  20 20 4e 53 4c 6f 67 28  |"....";.  NSLog(|
    

    如您所见,该字符的正确 UTF-8 表示位于文件中您期望的位置。使用其他方法之一并尝试将源文件保留在 ASCII 范围内可能更安全。

回答by Almer Lucke

I created some utility classes to convert easily between unicode code points, UTF-8 byte sequences and NSString. You can find the codeon Github, maybe it is of some use to someone.

我创建了一些实用程序类来在 unicode 代码点、UTF-8 字节序列和 NSString 之间轻松转换。你可以在 Github 上找到代码,也许它对某人有用。