如何在 Objective-C 中将 unichar 值转换为 NSString?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1775859/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 22:31:55  来源:igfitidea点击:

How to convert a unichar value to an NSString in Objective-C?

objective-cunicodensstring

提问by Terry

I've got an international character stored in a unichar variable. This character does not come from a file or url. The variable itself only stores an unsigned short(0xce91) which is in UTF-8 format and translates to the greek capital letter 'A'. I'm trying to put that character into an NSString variable but i fail miserably.

我有一个存储在 unichar 变量中的国际字符。此字符不是来自文件或 url。该变量本身只存储一个 UTF-8 格式的无符号 short(0xce91) 并转换为希腊大写字母“A”。我试图将该字符放入 NSString 变量中,但我失败了。

I've tried 2 different ways both of which unsuccessful:

我尝试了两种不同的方法,但都没有成功:

unichar greekAlpha = 0xce91; //could have written greekAlpha = 'Α' instead.

NSString *theString = [NSString stringWithFormat:@"Greek Alpha: %C", greekAlpha];

No good. I get some weird chinese characters. As a sidenote this works perfectly with english characters.

不好。我得到了一些奇怪的汉字。作为旁注,这与英文字符完美搭配。

Then I also tried this:

然后我也试过这个:

NSString *byteString = [[NSString alloc] initWithBytes:&greekAlpha
                                                length:sizeof(unichar)
                                              encoding:NSUTF8StringEncoding];

But this doesn't work either. I'm obviously doing something terribly wrong, but I don't know what. Can someone help me please ? Thanks!

但这也行不通。我显然做错了什么,但我不知道是什么。有人能帮助我吗 ?谢谢!

采纳答案by hallski

Since 0xce91is in the UTF-8 format and %Cexpects it to be in UTF-16 a simple solution like the one above won't work. For stringWithFormat:@"%C"to work you need to input 0x391which is the UTF-16 unicode.

由于0xce91采用 UTF-8 格式并%C希望它采用 UTF-16,因此上面的简单解决方案将不起作用。为了stringWithFormat:@"%C"工作,您需要输入0x391UTF-16 unicode。

In order to create a string from the UTF-8 encoded unichar you need to first split the unicode into it's octets and then use initWithBytes:length:encoding.

为了从 UTF-8 编码的 unichar 创建字符串,您需要首先将 unicode 拆分为它的八位字节,然后使用initWithBytes:length:encoding.

unichar utf8char = 0xce91; 
char chars[2];
int len = 1;

if (utf8char > 127) {
    chars[0] = (utf8char >> 8) & (1 << 8) - 1;
    chars[1] = utf8char & (1 << 8) - 1; 
    len = 2;
} else {
    chars[0] = utf8char;
}

NSString *string = [[NSString alloc] initWithBytes:chars
                                            length:len 
                                          encoding:NSUTF8StringEncoding];

回答by matt

unichar greekAlpha = 0x0391;
NSString* s = [NSString stringWithCharacters:&greekAlpha length:1];

And now you can incorporate that NSString into another in any way you like. Do note, however, that it is now legal to type a Greek alpha directly into an NSString literal.

现在您可以以任何您喜欢的方式将该 NSString 合并到另一个中。但是请注意,现在将希腊字母直接键入 NSString 文字是合法的。

回答by Jon Jardine

The above answer is great but doesn't account for UTF-8 characters longer than 16 bits, e.g. the ellipsis symbol - 0xE2,0x80,0xA6. Here's a tweak to the code:

上面的答案很好,但没有考虑超过 16 位的 UTF-8 字符,例如省略号 - 0xE2,0x80,0xA6。这是对代码的调整:

if (utf8char > 65535) {
   chars[0] = (utf8char >> 16) & 255;
   chars[1] = (utf8char >> 8) & 255;
   chars[2] = utf8char & 255; 
   chars[3] = 0x00;
} else if (utf8char > 127) {
    chars[0] = (utf8char >> 8) & 255;
    chars[1] = utf8char & 255; 
    chars[2] = 0x00;
} else {
    chars[0] = utf8char;
    chars[1] = 0x00;
}
NSString *string = [[[NSString alloc] initWithUTF8String:chars] autorelease];

Note the different string initialisation method which doesn't require a length parameter.

请注意不需要长度参数的不同字符串初始化方法。

回答by yusufag

Here is an algorithm for UTF-8 encoding on a single character:

以下是对单个字符进行 UTF-8 编码的算法:

if (utf8char<0x80){ 
    chars[0] = (utf8char>>0)  & (0x7F | 0x00);
    chars[1] = 0x00;
    chars[2] = 0x00;
    chars[3] = 0x00;
}
else if (utf8char<0x0800){
    chars[0] = (utf8char>>6)  & (0x1F | 0xC0);
    chars[1] = (utf8char>>0)  & (0x3F | 0x80);
    chars[2] = 0x00;
    chars[3] = 0x00;
}
else if (utf8char<0x010000) {
    chars[0] = (utf8char>>12) & (0x0F | 0xE0);
    chars[1] = (utf8char>>6)  & (0x3F | 0x80);
    chars[2] = (utf8char>>0)  & (0x3F | 0x80);
    chars[3] = 0x00;
}
else if (utf8char<0x110000) {
    chars[0] = (utf8char>>18) & (0x07 | 0xF0);
    chars[1] = (utf8char>>12) & (0x3F | 0x80);
    chars[2] = (utf8char>>6)  & (0x3F | 0x80);
    chars[3] = (utf8char>>0)  & (0x3F | 0x80);
}

回答by tc.

The code above is the moral equivalent of unichar foo = 'abc';.

上面的代码在道德上是等价的unichar foo = 'abc';

The problem is that 'Α'doesn't map to a single byte in the "execution character set" (I'm assuming UTF-8) which is "implementation-defined" in C99§6.4.4.4 10:

问题是'Α'它没有映射到“执行字符集”(我假设是 UTF-8)中的单个字节,它是C99§6.4.4.4 10 中的“实现定义” :

The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.

包含多个字符(例如,'ab')或包含不映射到单字节执行字符的字符或转义序列的整数字符常量的值是实现定义的。

One way is to make 'ab'equal to 'a'<<8|b. Some Mac/iOS system headers rely on this for things like OSType/FourCharCode/FourCC; the only one in iOS that comes to mind is CoreVideo pixel formats. This is, however, unportable.

一种方法是使'ab'等于'a'<<8|b。某些Mac / iOS系统的头靠这对于像OSType/ FourCharCode/的FourCC; iOS 中唯一想到的是 CoreVideo 像素格式。然而,这是不可移植的。

If you really want a unicharliteral, you can try L'A'(technically it's a wchar_tliteral, but on OS X and iOS, wchar_tis typically UTF-16 so it'll work for things inside the BMP). However, it's far simpler to just use @"Α"(which works as long as you set the source character encoding correctly) or @"\u0391"(which has worked since at least the iOS 3 SDK).

如果你真的想要一个unichar文字,你可以尝试L'A'(从技术上讲它是一个wchar_t文字,但在 OS X 和 iOS 上,wchar_t通常是 UTF-16,所以它适用于 BMP 内部的东西)。但是,它的使用要简单得多@"Α"(只要您正确设置源字符编码就可以使用)或@"\u0391"(至少从 iOS 3 SDK 开始就可以使用)。