objective-c 使用 Objective C/Cocoa 对 unicode 字符进行转义,即 \u1234

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2099349/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 22:48:44  来源:igfitidea点击:

Using Objective C/Cocoa to unescape unicode characters, ie \u1234

objective-ccocoaunicode

提问by corydoras

Some sites that I am fetching data from are returning UTF-8 strings, with the UTF-8 characters escaped, ie: \u5404\u500b\u90fd

我从中获取数据的某些站点返回 UTF-8 字符串,其中转义了 UTF-8 字符,即: \u5404\u500b\u90fd

Is there a built in cocoa function that might assist with this or will I have to write my own decoding algorithm.

是否有内置的可可函数可以帮助解决这个问题,或者我是否必须编写自己的解码算法。

采纳答案by kennytm

There is no built-in function to do C unescaping.

没有内置函数来执行 C 转义。

You can cheat a little with NSPropertyListSerializationsince an "old text style" plist supports C escaping via \Uxxxx:

NSPropertyListSerialization由于“旧文本样式”plist 支持 C 转义,因此您可以通过以下方式作弊\Uxxxx

NSString* input = @"ab\"cA\"BC\u2345\u0123";

// will cause trouble if you have "abc\\uvw"
NSString* esc1 = [input stringByReplacingOccurrencesOfString:@"\u" withString:@"\U"];
NSString* esc2 = [esc1 stringByReplacingOccurrencesOfString:@"\"" withString:@"\\""];
NSString* quoted = [[@"\"" stringByAppendingString:esc2] stringByAppendingString:@"\""];
NSData* data = [quoted dataUsingEncoding:NSUTF8StringEncoding];
NSString* unesc = [NSPropertyListSerialization propertyListFromData:data
                   mutabilityOption:NSPropertyListImmutable format:NULL
                   errorDescription:NULL];
assert([unesc isKindOfClass:[NSString class]]);
NSLog(@"Output = %@", unesc);

but mind that this isn't very efficient. It's far better if you write up your own parser. (BTW are you decoding JSON strings? If yes you could use the existing JSON parsers.)

但请注意,这不是很有效。如果您编写自己的解析器,那就更好了。(顺便说一句,你在解码 JSON 字符串吗?如果是,你可以使用现有的 JSON 解析器。)

回答by Nikolai Ruhe

It's correct that Cocoadoes not offer a solution, yet Core Foundationdoes: CFStringTransform.

这是正确的,可可并没有提供一个解决方案,但核心基础作用:CFStringTransform

CFStringTransformlives in a dusty, remote corner of Mac OS (and iOS) and so it's a little know gem. It is the front end to Apple's ICU compatiblestring transformation engine. It can perform real magic like transliterations between greek and latin (or about any known scripts), but it can also be used to do mundane tasks like unescaping strings from a crappy server:

CFStringTransform住在 Mac OS(和 iOS)尘土飞扬的偏远角落,所以它是一个鲜为人知的宝石。它是 Apple 的ICU 兼容字符串转换引擎的前端。它可以执行真正的魔术,例如希腊语和拉丁语(或任何已知脚本)之间的音译,但它也可以用于执行普通任务,例如从蹩脚的服务器中转义字符串:

NSString *input = @"\u5404\u500b\u90fd";
NSString *convertedString = [input mutableCopy];

CFStringRef transform = CFSTR("Any-Hex/Java");
CFStringTransform((__bridge CFMutableStringRef)convertedString, NULL, transform, YES);

NSLog(@"convertedString: %@", convertedString);

// prints: 各個都, tada!

As I said, CFStringTransformis really powerful. It supports a number of predefined transforms, like case mappings, normalizations or unicode character name conversion. You can even design your own transformations.

正如我所说,CFStringTransform它真的很强大。它支持许多预定义的转换,如大小写映射、规范化或 unicode 字符名称转换。您甚至可以设计自己的转换。

I have no idea why Apple does not make it available from Cocoa.

我不知道为什么 Apple 不从 Cocoa 中提供它。

Edit 2015:

2015年编辑:

OS X 10.11 and iOS 9 add the following method to Foundation:

OS X 10.11 和 iOS 9 向 Foundation 添加以下方法:

- (nullable NSString *)stringByApplyingTransform:(NSString *)transform reverse:(BOOL)reverse;

So the example from above becomes...

所以上面的例子变成了......

NSString *input = @"\u5404\u500b\u90fd";
NSString *convertedString = [input stringByApplyingTransform:@"Any-Hex/Java"
                                                     reverse:YES];

NSLog(@"convertedString: %@", convertedString);

Thanks @nschmidtfor the heads up.

感谢@nschmidt 的提醒

回答by Christoph

Here's what I ended up writing. Hopefully this will help some people along.

这是我最终写的。希望这会帮助一些人。

+ (NSString*) unescapeUnicodeString:(NSString*)string
{
// unescape quotes and backwards slash
NSString* unescapedString = [string stringByReplacingOccurrencesOfString:@"\\"" withString:@"\""];
unescapedString = [unescapedString stringByReplacingOccurrencesOfString:@"\\" withString:@"\"];

// tokenize based on unicode escape char
NSMutableString* tokenizedString = [NSMutableString string];
NSScanner* scanner = [NSScanner scannerWithString:unescapedString];
while ([scanner isAtEnd] == NO)
{
    // read up to the first unicode marker
    // if a string has been scanned, it's a token
    // and should be appended to the tokenized string
    NSString* token = @"";
    [scanner scanUpToString:@"\u" intoString:&token];
    if (token != nil && token.length > 0)
    {
        [tokenizedString appendString:token];
        continue;
    }

    // skip two characters to get past the marker
    // check if the range of unicode characters is
    // beyond the end of the string (could be malformed)
    // and if it is, move the scanner to the end
    // and skip this token
    NSUInteger location = [scanner scanLocation];
    NSInteger extra = scanner.string.length - location - 4 - 2;
    if (extra < 0)
    {
        NSRange range = {location, -extra};
        [tokenizedString appendString:[scanner.string substringWithRange:range]];
        [scanner setScanLocation:location - extra];
        continue;
    }

    // move the location pas the unicode marker
    // then read in the next 4 characters
    location += 2;
    NSRange range = {location, 4};
    token = [scanner.string substringWithRange:range];
    unichar codeValue = (unichar) strtol([token UTF8String], NULL, 16);
    [tokenizedString appendString:[NSString stringWithFormat:@"%C", codeValue]];

    // move the scanner past the 4 characters
    // then keep scanning
    location += 4;
    [scanner setScanLocation:location];
}

// done
return tokenizedString;
}

+ (NSString*) escapeUnicodeString:(NSString*)string
{
// lastly escaped quotes and back slash
// note that the backslash has to be escaped before the quote
// otherwise it will end up with an extra backslash
NSString* escapedString = [string stringByReplacingOccurrencesOfString:@"\" withString:@"\\"];
escapedString = [escapedString stringByReplacingOccurrencesOfString:@"\"" withString:@"\\""];

// convert to encoded unicode
// do this by getting the data for the string
// in UTF16 little endian (for network byte order)
NSData* data = [escapedString dataUsingEncoding:NSUTF16LittleEndianStringEncoding allowLossyConversion:YES];
size_t bytesRead = 0;
const char* bytes = data.bytes;
NSMutableString* encodedString = [NSMutableString string];

// loop through the byte array
// read two bytes at a time, if the bytes
// are above a certain value they are unicode
// otherwise the bytes are ASCII characters
// the %C format will write the character value of bytes
while (bytesRead < data.length)
{
    uint16_t code = *((uint16_t*) &bytes[bytesRead]);
    if (code > 0x007E)
    {
        [encodedString appendFormat:@"\u%04X", code];
    }
    else
    {
        [encodedString appendFormat:@"%C", code];
    }
    bytesRead += sizeof(uint16_t);
}

// done
return encodedString;
}

回答by likid1412

simple code:

简单代码:

const char *cString = [unicodeStr cStringUsingEncoding:NSUTF8StringEncoding];
NSString *resultStr = [NSString stringWithCString:cString encoding:NSNonLossyASCIIStringEncoding];

from: https://stackoverflow.com/a/7861345

来自:https: //stackoverflow.com/a/7861345