Objective-C 中的 NSString 标记化
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/259956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
NSString tokenize in Objective-C
提问by Adam Alexander
What is the best way to tokenize/split a NSString in Objective-C?
在 Objective-C 中标记/拆分 NSString 的最佳方法是什么?
回答by Adam Alexander
Found this at http://borkware.com/quickies/one?topic=NSString(useful link):
在http://borkware.com/quickies/one?topic=NSString(有用的链接)上找到了这个:
NSString *string = @"oop:ack:bork:greeble:ponies";
NSArray *chunks = [string componentsSeparatedByString: @":"];
Hope this helps!
希望这可以帮助!
Adam
亚当
回答by Matt Gallagher
Everyone has mentioned componentsSeparatedByString:but you can also use CFStringTokenizer(remember that an NSStringand CFStringare interchangeable) which will tokenize natural languages too (like Chinese/Japanese which don't split words on spaces).
每个人都提到过,componentsSeparatedByString:但您也可以使用CFStringTokenizer(请记住,NSString和CFString可以互换)也可以标记自然语言(例如中文/日语,它们不会在空格上拆分单词)。
回答by Chris Hanson
If you just want to split a string, use -[NSString componentsSeparatedByString:]. For more complex tokenization, use the NSScanner class.
如果您只想拆分字符串,请使用-[NSString componentsSeparatedByString:]. 对于更复杂的标记化,请使用 NSScanner 类。
回答by Todd Ditchendorf
If your tokenization needs are more complex, check out my open source Cocoa String tokenizing/parsing toolkit: ParseKit:
如果您的标记化需求更复杂,请查看我的开源 Cocoa String 标记化/解析工具包:ParseKit:
For simple splitting of strings using a delimiter char (like ':'), ParseKit would definitely be overkill. But again, for complex tokenization needs, ParseKit is extremely powerful/flexible.
对于使用分隔符字符(如':')进行简单的字符串拆分,ParseKit 肯定会大材小用。但同样,对于复杂的标记化需求,ParseKit 非常强大/灵活。
Also see the ParseKit Tokenization documentation.
另请参阅ParseKit 标记化文档。
回答by Wienke
If you want to tokenize on multiple characters, you can use NSString's componentsSeparatedByCharactersInSet. NSCharacterSet has some handy pre-made sets like the whitespaceCharacterSetand the illegalCharacterSet. And it has initializers for Unicode ranges.
如果要标记多个字符,可以使用 NSString 的componentsSeparatedByCharactersInSet. NSCharacterSet 有一些方便的预制集合,例如whitespaceCharacterSet和illegalCharacterSet。它具有 Unicode 范围的初始值设定项。
You can also combine character sets and use them to tokenize, like this:
您还可以组合字符集并使用它们进行标记,如下所示:
// Tokenize sSourceEntityName on both whitespace and punctuation.
NSMutableCharacterSet *mcharsetWhitePunc = [[NSCharacterSet whitespaceAndNewlineCharacterSet] mutableCopy];
[mcharsetWhitePunc formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]];
NSArray *sarrTokenizedName = [self.sSourceEntityName componentsSeparatedByCharactersInSet:mcharsetWhitePunc];
[mcharsetWhitePunc release];
Be aware that componentsSeparatedByCharactersInSetwill produce blank strings if it encounters more than one member of the charSet in a row, so you might want to test for lengths less than 1.
请注意,componentsSeparatedByCharactersInSet如果它在一行中遇到多个 charSet 成员,则会产生空白字符串,因此您可能需要测试长度是否小于 1。
回答by Michael Waterfall
If you're looking to tokenise a string into search terms while preserving "quoted phrases", here's an NSStringcategory that respects various types of quote pairs: ""''‘'“”
如果您希望在保留“引用短语”的同时将字符串标记为搜索词,这里有一个NSString尊重各种类型引用对的类别:""''‘'“”
Usage:
用法:
NSArray *terms = [@"This is my \"search phrase\" I want to split" searchTerms];
// results in: ["This", "is", "my", "search phrase", "I", "want", "to", "split"]
Code:
代码:
@interface NSString (Search)
- (NSArray *)searchTerms;
@end
@implementation NSString (Search)
- (NSArray *)searchTerms {
// Strip whitespace and setup scanner
NSCharacterSet *whitespace = [NSCharacterSet whitespaceAndNewlineCharacterSet];
NSString *searchString = [self stringByTrimmingCharactersInSet:whitespace];
NSScanner *scanner = [NSScanner scannerWithString:searchString];
[scanner setCharactersToBeSkipped:nil]; // we'll handle whitespace ourselves
// A few types of quote pairs to check
NSDictionary *quotePairs = @{@"\"": @"\"",
@"'": @"'",
@"\u2018": @"\u2019",
@"\u201C": @"\u201D"};
// Scan
NSMutableArray *results = [[NSMutableArray alloc] init];
NSString *substring = nil;
while (scanner.scanLocation < searchString.length) {
// Check for quote at beginning of string
unichar unicharacter = [self characterAtIndex:scanner.scanLocation];
NSString *startQuote = [NSString stringWithFormat:@"%C", unicharacter];
NSString *endQuote = [quotePairs objectForKey:startQuote];
if (endQuote != nil) { // if it's a valid start quote we'll have an end quote
// Scan quoted phrase into substring (skipping start & end quotes)
[scanner scanString:startQuote intoString:nil];
[scanner scanUpToString:endQuote intoString:&substring];
[scanner scanString:endQuote intoString:nil];
} else {
// Single word that is non-quoted
[scanner scanUpToCharactersFromSet:whitespace intoString:&substring];
}
// Process and add the substring to results
if (substring) {
substring = [substring stringByTrimmingCharactersInSet:whitespace];
if (substring.length) [results addObject:substring];
}
// Skip to next word
[scanner scanCharactersFromSet:whitespace intoString:nil];
}
// Return non-mutable array
return results.copy;
}
@end
回答by Robert
If you are looking for splitting linguistic feature's of a string (Words, paragraphs, characters, sentences and lines), use string enumeration:
如果您正在寻找分割字符串(单词、段落、字符、句子和行)的语言特征,请使用字符串枚举:
NSString * string = @" \n word1! word2,%$?'/word3.word4 ";
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationByWords
usingBlock:
^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(@"Substring: '%@'", substring);
}];
// Logs:
// Substring: 'word1'
// Substring: 'word2'
// Substring: 'word3'
// Substring: 'word4'
This api works with other languages where spaces are not always the delimiter (e.g. Japanese). Also using NSStringEnumerationByComposedCharacterSequencesis the proper way to enumerate over characters, since many non-western characters are more than one byte long.
此 api 适用于其他语言,其中空格并不总是分隔符(例如日语)。同样使用NSStringEnumerationByComposedCharacterSequences是枚举字符的正确方法,因为许多非西方字符的长度超过一个字节。
回答by Rosario Carcò
I had a case where I had to split the console output after an LDAP query with ldapsearch. First set up and execute the NSTask (I found a good code sample here: Execute a terminal command from a Cocoa app). But then I had to split and parse the output so as to extract only the print-server names out of the Ldap-query-output. Unfortunately it is rather tedious string-manipulation which would be no problem at all if we were to manipulate C-strings/arrays with simple C-array operations. So here is my code using cocoa objects. If you have better suggestions, let me know.
我有一个案例,我必须在使用 ldapsearch 进行 LDAP 查询后拆分控制台输出。首先设置并执行 NSTask(我在这里找到了一个很好的代码示例:Execute a terminal command from a Cocoa app)。但后来我不得不拆分和解析输出,以便仅从 Ldap-query-output 中提取打印服务器名称。不幸的是,它是相当乏味的字符串操作,如果我们用简单的 C 数组操作来操作 C 字符串/数组,这完全没有问题。所以这是我使用可可对象的代码。如果您有更好的建议,请告诉我。
//as the ldap query has to be done when the user selects one of our Active Directory Domains
//(an according comboBox should be populated with print-server names we discover from AD)
//my code is placed in the onSelectDomain event code
//the following variables are declared in the interface .h file as globals
@protected NSArray* aDomains;//domain combo list array
@protected NSMutableArray* aPrinters;//printer combo list array
@protected NSMutableArray* aPrintServers;//print server combo list array
@protected NSString* sLdapQueryCommand;//for LDAP Queries
@protected NSArray* aLdapQueryArgs;
@protected NSTask* tskLdapTask;
@protected NSPipe* pipeLdapTask;
@protected NSFileHandle* fhLdapTask;
@protected NSMutableData* mdLdapTask;
IBOutlet NSComboBox* comboDomain;
IBOutlet NSComboBox* comboPrinter;
IBOutlet NSComboBox* comboPrintServer;
//end of interface globals
//after collecting the print-server names they are displayed in an according drop-down comboBox
//as soon as the user selects one of the print-servers, we should start a new query to find all the
//print-queues on that server and display them in the comboPrinter drop-down list
//to find the shares/print queues of a windows print-server you need samba and the net -S command like this:
// net -S yourPrintServerName.yourBaseDomain.com -U yourLdapUser%yourLdapUserPassWord -W adm rpc share -l
//which dispalays a long list of the shares
- (IBAction)onSelectDomain:(id)sender
{
static int indexOfLastItem = 0; //unfortunately we need to compare this because we are called also if the selection did not change!
if ([comboDomain indexOfSelectedItem] != indexOfLastItem && ([comboDomain indexOfSelectedItem] != 0))
{
indexOfLastItem = [comboDomain indexOfSelectedItem]; //retain this index for next call
//the print-servers-list has to be loaded on a per univeristy or domain basis from a file dynamically or from AN LDAP-QUERY
//initialize an LDAP-Query-Task or console-command like this one with console output
/*
ldapsearch -LLL -s sub -D "cn=yourLdapUser,ou=yourOuWithLdapUserAccount,dc=yourDomain,dc=com" -h "yourLdapServer.com" -p 3268 -w "yourLdapUserPassWord" -b "dc=yourBaseDomainToSearchIn,dc=com" "(&(objectcategory=computer)(cn=ps*))" "dn"
//our print-server names start with ps* and we want the dn as result, wich comes like this:
dn: CN=PSyourPrintServerName,CN=Computers,DC=yourBaseDomainToSearchIn,DC=com
*/
sLdapQueryCommand = [[NSString alloc] initWithString: @"/usr/bin/ldapsearch"];
if ([[comboDomain stringValue] compare: @"firstDomain"] == NSOrderedSame) {
aLdapQueryArgs = [NSArray arrayWithObjects: @"-LLL",@"-s", @"sub",@"-D", @"cn=yourLdapUser,ou=yourOuWithLdapUserAccount,dc=yourDomain,dc=com",@"-h", @"yourLdapServer.com",@"-p",@"3268",@"-w",@"yourLdapUserPassWord",@"-b",@"dc=yourFirstDomainToSearchIn,dc=com",@"(&(objectcategory=computer)(cn=ps*))",@"dn",nil];
}
else {
aLdapQueryArgs = [NSArray arrayWithObjects: @"-LLL",@"-s", @"sub",@"-D", @"cn=yourLdapUser,ou=yourOuWithLdapUserAccount,dc=yourDomain,dc=com",@"-h", @"yourLdapServer.com",@"-p",@"3268",@"-w",@"yourLdapUserPassWord",@"-b",@"dc=yourSecondDomainToSearchIn,dc=com",@"(&(objectcategory=computer)(cn=ps*))",@"dn",nil];
}
//prepare and execute ldap-query task
tskLdapTask = [[NSTask alloc] init];
pipeLdapTask = [[NSPipe alloc] init];//instead of [NSPipe pipe]
[tskLdapTask setStandardOutput: pipeLdapTask];//hope to get the tasks output in this file/pipe
//The magic line that keeps your log where it belongs, has to do with NSLog (see https://stackoverflow.com/questions/412562/execute-a-terminal-command-from-a-cocoa-app and here http://www.cocoadev.com/index.pl?NSTask )
[tskLdapTask setStandardInput:[NSPipe pipe]];
//fhLdapTask = [[NSFileHandle alloc] init];//would be redundand here, next line seems to do the trick also
fhLdapTask = [pipeLdapTask fileHandleForReading];
mdLdapTask = [NSMutableData dataWithCapacity:512];//prepare capturing the pipe buffer which is flushed on read and can overflow, start with 512 Bytes but it is mutable, so grows dynamically later
[tskLdapTask setLaunchPath: sLdapQueryCommand];
[tskLdapTask setArguments: aLdapQueryArgs];
#ifdef bDoDebug
NSLog (@"sLdapQueryCommand: %@\n", sLdapQueryCommand);
NSLog (@"aLdapQueryArgs: %@\n", aLdapQueryArgs );
NSLog (@"tskLdapTask: %@\n", [tskLdapTask arguments]);
#endif
[tskLdapTask launch];
while ([tskLdapTask isRunning]) {
[mdLdapTask appendData: [fhLdapTask readDataToEndOfFile]];
}
[tskLdapTask waitUntilExit];//might be redundant here.
[mdLdapTask appendData: [fhLdapTask readDataToEndOfFile]];//add another read for safety after process/command stops
NSString* sLdapOutput = [[NSString alloc] initWithData: mdLdapTask encoding: NSUTF8StringEncoding];//convert output to something readable, as NSData and NSMutableData are mere byte buffers
#ifdef bDoDebug
NSLog(@"LdapQueryOutput: %@\n", sLdapOutput);
#endif
//Ok now we have the printservers from Active Directory, lets parse the output and show the list to the user in its combo box
//output is formatted as this, one printserver per line
//dn: CN=PSyourPrintServer,OU=Computers,DC=yourBaseDomainToSearchIn,DC=com
//so we have to search for "dn: CN=" to retrieve each printserver's name
//unfortunately splitting this up will give us a first line containing only "" empty string, which we can replace with the word "choose"
//appearing as first entry in the comboBox
aPrintServers = (NSMutableArray*)[sLdapOutput componentsSeparatedByString:@"dn: CN="];//split output into single lines and store it in the NSMutableArray aPrintServers
#ifdef bDoDebug
NSLog(@"aPrintServers: %@\n", aPrintServers);
#endif
if ([[aPrintServers objectAtIndex: 0 ] compare: @"" options: NSLiteralSearch] == NSOrderedSame){
[aPrintServers replaceObjectAtIndex: 0 withObject: slChoose];//replace with localized string "choose"
#ifdef bDoDebug
NSLog(@"aPrintServers: %@\n", aPrintServers);
#endif
}
//Now comes the tedious part to extract only the print-server-names from the single lines
NSRange r;
NSString* sTemp;
for (int i = 1; i < [aPrintServers count]; i++) {//skip first line with "choose". To get rid of the rest of the line, we must isolate/preserve the print server's name to the delimiting comma and remove all the remaining characters
sTemp = [aPrintServers objectAtIndex: i];
sTemp = [sTemp stringByTrimmingCharactersInSet: [NSCharacterSet whitespaceAndNewlineCharacterSet]];//remove newlines and line feeds
#ifdef bDoDebug
NSLog(@"sTemp: %@\n", sTemp);
#endif
r = [sTemp rangeOfString: @","];//now find first comma to remove the whole rest of the line
//r.length = [sTemp lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
r.length = [sTemp length] - r.location;//calculate number of chars between first comma found and lenght of string
#ifdef bDoDebug
NSLog(@"range: %i, %i\n", r.location, r.length);
#endif
sTemp = [sTemp stringByReplacingCharactersInRange:r withString: @"" ];//remove rest of line
#ifdef bDoDebug
NSLog(@"sTemp after replace: %@\n", sTemp);
#endif
[aPrintServers replaceObjectAtIndex: i withObject: sTemp];//put back string into array for display in comboBox
#ifdef bDoDebug
NSLog(@"aPrintServer: %@\n", [aPrintServers objectAtIndex: i]);
#endif
}
[comboPrintServer removeAllItems];//reset combo box
[comboPrintServer addItemsWithObjectValues:aPrintServers];
[comboPrintServer setNumberOfVisibleItems:aPrintServers.count];
[comboPrintServer selectItemAtIndex:0];
#ifdef bDoDebug
NSLog(@"comboPrintServer reloaded with new values.");
#endif
//release memory we used for LdapTask
[sLdapQueryCommand release];
[aLdapQueryArgs release];
[sLdapOutput release];
[fhLdapTask release];
[pipeLdapTask release];
// [tskLdapTask release];//strangely can not be explicitely released, might be autorelease anyway
// [mdLdapTask release];//strangely can not be explicitely released, might be autorelease anyway
[sTemp release];
}
}
回答by amar
I have my self come across instance where it was not enough to just separate string by component many tasks such as
1) Categorizing token into types
2) Adding new tokens
3)Separating string between custom closures like all words between "{" and "}"
For any such requirements i found Parse Kita life saver.
我自己遇到过这样的情况,其中仅按组件分隔字符串是不够的许多任务,例如
1
) 将标记分类为类型2) 添加新标记
3) 在自定义闭包之间分隔字符串,例如“{”和“} 之间的所有单词“
对于任何此类要求,我发现Parse Kit是一个救星。
I used it to parse .PGN (prtable gaming notation) files successfully its very fast and lite.
我用它成功解析了 .PGN(prtable 游戏符号)文件,它非常快速和精简。

