ios 在 Swift 中使用 Unicode 代码点
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31272561/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Working with Unicode code points in Swift
提问by Suragch
If you are not interested in the details of Mongolian but just want a quick answer about using and converting Unicode values in Swift, then skip down to the first part of the accepted answer.
如果您对蒙古语的详细信息不感兴趣,而只想快速了解如何在 Swift 中使用和转换 Unicode 值,请跳至已接受答案的第一部分。
Background
背景
I want to render Unicode text for traditional Mongolian to be used in iOS apps. The better and long term solution is to use an AAT smart fontthat would render this complex script. (Such fonts do existbut their license does not allow modification and non-personal use.) However, since I have never made a font, let alone all of the rendering logic for an AAT font, I just plan to do the rendering myself in Swift for now. Maybe at some later date I can learn to make a smart font.
我想为要在 iOS 应用程序中使用的传统蒙古语呈现 Unicode 文本。更好的长期解决方案是使用AAT 智能字体来呈现这个复杂的脚本。(这样的字体确实存在,但他们的许可证不允许修改和非个人使用。)但是,由于我从未制作过字体,更不用说AAT字体的所有渲染逻辑,我只是打算自己在斯威夫特现在。也许以后我可以学习制作智能字体。
Externally I will use Unicode text, but internally (for display in a UITextView
) I will convert the Unicode to individual glyphs that are stored in a dumb font (coded with Unicode PUAvalues). So my rendering engine needs to convert Mongolian Unicode values (range: U+1820 to U+1842) to glyph values stored in the PUA (range: U+E360 to U+E5CF). Anyway, this is my plan since it is what I did in Java in the past, but maybe I need to change my whole way of thinking.
在外部,我将使用 Unicode 文本,但在内部(用于在 中显示UITextView
),我会将 Unicode 转换为存储在哑字体(使用 Unicode PUA值编码)中的单个字形。所以我的渲染引擎需要将蒙古语 Unicode 值(范围:U+1820 到 U+1842)转换为存储在 PUA 中的字形值(范围:U+E360 到 U+E5CF)。无论如何,这是我的计划,因为这是我过去在 Java 中所做的,但也许我需要改变我的整个思维方式。
Example
例子
The following image shows suwritten twice in Mongolian using two different forms for the letter u(in red). (Mongolian is written vertically with letters being connected like cursive letters in English.)
下图显示了su用蒙古语写了两次,使用了字母u 的两种不同形式(红色)。(蒙古文是竖写的,字母像英文中的草书一样连接起来。)
In Unicode these two strings would be expressed as
在 Unicode 中,这两个字符串将表示为
var suForm1: String = "\u{1830}\u{1826}"
var suForm2: String = "\u{1830}\u{1826}\u{180B}"
The Free Variation Selector (U+180B) in suForm2
is recognized (correctly) by Swift String
to be a unit with the u(U+1826) that precedes it. It is considered by Swift to be a single character, an extended grapheme cluster. However, for the purposes of doing the rendering myself, I need to differentiate u(U+1826) and FVS1 (U+180B) as two distinct UTF-16 code points.
自由变体选择器 (U+180B) insuForm2
被 Swift(正确)识别String
为一个单位,前面有u(U+1826)。Swift 认为它是单个字符,一个扩展的字素簇。但是,为了自己进行渲染,我需要将u(U+1826) 和 FVS1 (U+180B) 区分为两个不同的 UTF-16 代码点。
For internal display purposes, I would convert the above Unicode strings to the following rendered glyph strings:
出于内部显示目的,我会将上述 Unicode 字符串转换为以下呈现的字形字符串:
suForm1 = "\u{E46F}\u{E3BA}"
suForm2 = "\u{E46F}\u{E3BB}"
Question
题
I have been playing around with Swift String
and Character
. There are a lot of convenient things about them, but since in my particular case I deal exclusively with UTF-16 code units, I wonder if I should be using the old NSString
rather than Swift's String
. I realize that I can use String.utf16
to get UTF-16 code points, but the conversion back to String
isn't very nice.
我一直在玩 SwiftString
和Character
. 它们有很多方便的地方,但由于在我的特殊情况下我专门处理 UTF-16 代码单元,我想知道我是否应该使用旧的NSString
而不是 Swift 的String
. 我意识到,我可以用String.utf16
得到UTF-16码点,但转换回String
是不是很好。
Would it be better to stick with String
and Character
or should I use NSString
and unichar
?
坚持使用String
andCharacter
还是我应该使用NSString
and会更好unichar
吗?
What I have read
我读过的
Updates to this question have been hidden in order to clean the page up. See the edit history.
为了清理页面,已隐藏对此问题的更新。查看编辑历史。
回答by Suragch
Updated for Swift 3
为 Swift 3 更新
String and Character
字符串和字符
For almost everyone in the future who visits this question, String
and Character
will be the answer for you.
对于未来访问此问题的几乎每个人,String
并将Character
为您提供答案。
Set Unicode values directly in code:
直接在代码中设置 Unicode 值:
var str: String = "I want to visit 北京, Москва, ?????, ???????, and ???. "
var character: Character = ""
Use hexadecimal to set values
使用十六进制设置值
var str: String = "\u{61}\u{5927}\u{1F34E}\u{3C0}" // a大π
var character: Character = "\u{65}\u{301}" // é = "e" + accent mark
Note that the Swift Character can be composed of multiple Unicode code points, but appears to be a single character. This is called an Extended Grapheme Cluster.
请注意,Swift 字符可以由多个 Unicode 代码点组成,但看起来是单个字符。这称为扩展字素簇。
See this questionalso.
也见这个问题。
Convert to Unicode values:
转换为 Unicode 值:
str.utf8
str.utf16
str.unicodeScalars // UTF-32
String(character).utf8
String(character).utf16
String(character).unicodeScalars
Convert from Unicode hex values:
从 Unicode 十六进制值转换:
let hexValue: UInt32 = 0x1F34E
// convert hex value to UnicodeScalar
guard let scalarValue = UnicodeScalar(hexValue) else {
// early exit if hex does not form a valid unicode value
return
}
// convert UnicodeScalar to String
let myString = String(scalarValue) //
Or alternatively:
或者:
let hexValue: UInt32 = 0x1F34E
if let scalarValue = UnicodeScalar(hexValue) {
let myString = String(scalarValue)
}
A few more examples
再举几个例子
let value0: UInt8 = 0x61
let value1: UInt16 = 0x5927
let value2: UInt32 = 0x1F34E
let string0 = String(UnicodeScalar(value0)) // a
let string1 = String(UnicodeScalar(value1)) // 大
let string2 = String(UnicodeScalar(value2)) //
// convert hex array to String
let myHexArray = [0x43, 0x61, 0x74, 0x203C, 0x1F431] // an Int array
var myString = ""
for hexValue in myHexArray {
myString.append(UnicodeScalar(hexValue))
}
print(myString) // Cat?
Note that for UTF-8 and UTF-16 the conversion is not always this easy. (See UTF-8, UTF-16, and UTF-32questions.)
请注意,对于 UTF-8 和 UTF-16,转换并不总是那么容易。(请参阅UTF-8、UTF-16和UTF-32问题。)
NSString and unichar
NSString 和 unichar
It is also possible to work with NSString
and unichar
in Swift, but you should realize that unless you are familiar with Objective C and good at converting the syntax to Swift, it will be difficult to find good documentation.
也可以使用SwiftNSString
并unichar
在 Swift 中工作,但您应该意识到,除非您熟悉 Objective C 并且擅长将语法转换为 Swift,否则很难找到好的文档。
Also, unichar
is a UInt16
array and as mentioned above the conversion from UInt16
to Unicode scalar values is not always easy (i.e., converting surrogate pairs for things like emoji and other characters in the upper code planes).
此外,unichar
是一个UInt16
数组,如上所述,从UInt16
Unicode 标量值转换为 Unicode 标量值并不总是那么容易(即,为上层代码平面中的表情符号和其他字符等内容转换代理项对)。
Custom string structure
自定义字符串结构
For the reasons mentioned in the question, I ended up not using any of the above methods. Instead I wrote my own string structure, which was basically an array of UInt32
to hold Unicode scalar values.
由于问题中提到的原因,我最终没有使用上述任何方法。相反,我编写了自己的字符串结构,它基本上是一个UInt32
用于保存 Unicode 标量值的数组。
Again, this is not the solution for most people. First consider using extensionsif you only need to extend the functionality of String
or Character
a little.
同样,这不是大多数人的解决方案。如果您只需要扩展或一点点的功能,请首先考虑使用扩展。String
Character
But if you really need to work exclusively with Unicode scalar values, you could write a custom struct.
但是,如果您确实需要专门处理 Unicode 标量值,则可以编写自定义结构。
The advantages are:
优点是:
- Don't need to constantly switch between Types (
String
,Character
,UnicodeScalar
,UInt32
, etc.) when doing string manipulation. - After Unicode manipulation is finished, the final conversion to
String
is easy. - Easy to add more methods when they are needed
- Simplifies converting code from Java or other languages
- 进行字符串操作时
String
,不需要经常在类型(Character
、UnicodeScalar
、UInt32
、 等)之间切换。 - Unicode 操作完成后,最终转换
String
为 就很容易了。 - 需要时轻松添加更多方法
- 简化从 Java 或其他语言转换代码
Disadavantages are:
缺点是:
- makes code less portable and less readable for other Swift developers
- not as well tested and optimized as the native Swift types
- it is yet another file that has to be included in a project every time you need it
- 使其他 Swift 开发人员的代码可移植性和可读性降低
- 不如原生 Swift 类型经过良好的测试和优化
- 它是另一个文件,每次需要时都必须包含在项目中
You can make your own, but here is mine for reference. The hardest part was making it Hashable.
你可以自己做,但这里是我的供参考。最难的部分是让它成为 Hashable。
// This struct is an array of UInt32 to hold Unicode scalar values
// Version 3.4.0 (Swift 3 update)
struct ScalarString: Sequence, Hashable, CustomStringConvertible {
fileprivate var scalarArray: [UInt32] = []
init() {
// does anything need to go here?
}
init(_ character: UInt32) {
self.scalarArray.append(character)
}
init(_ charArray: [UInt32]) {
for c in charArray {
self.scalarArray.append(c)
}
}
init(_ string: String) {
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}
// Generator in order to conform to SequenceType protocol
// (to allow users to iterate as in `for myScalarValue in myScalarString` { ... })
func makeIterator() -> AnyIterator<UInt32> {
return AnyIterator(scalarArray.makeIterator())
}
// append
mutating func append(_ scalar: UInt32) {
self.scalarArray.append(scalar)
}
mutating func append(_ scalarString: ScalarString) {
for scalar in scalarString {
self.scalarArray.append(scalar)
}
}
mutating func append(_ string: String) {
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}
// charAt
func charAt(_ index: Int) -> UInt32 {
return self.scalarArray[index]
}
// clear
mutating func clear() {
self.scalarArray.removeAll(keepingCapacity: true)
}
// contains
func contains(_ character: UInt32) -> Bool {
for scalar in self.scalarArray {
if scalar == character {
return true
}
}
return false
}
// description (to implement Printable protocol)
var description: String {
return self.toString()
}
// endsWith
func endsWith() -> UInt32? {
return self.scalarArray.last
}
// indexOf
// returns first index of scalar string match
func indexOf(_ string: ScalarString) -> Int? {
if scalarArray.count < string.length {
return nil
}
for i in 0...(scalarArray.count - string.length) {
for j in 0..<string.length {
if string.charAt(j) != scalarArray[i + j] {
break // substring mismatch
}
if j == string.length - 1 {
return i
}
}
}
return nil
}
// insert
mutating func insert(_ scalar: UInt32, atIndex index: Int) {
self.scalarArray.insert(scalar, at: index)
}
mutating func insert(_ string: ScalarString, atIndex index: Int) {
var newIndex = index
for scalar in string {
self.scalarArray.insert(scalar, at: newIndex)
newIndex += 1
}
}
mutating func insert(_ string: String, atIndex index: Int) {
var newIndex = index
for scalar in string.unicodeScalars {
self.scalarArray.insert(scalar.value, at: newIndex)
newIndex += 1
}
}
// isEmpty
var isEmpty: Bool {
return self.scalarArray.count == 0
}
// hashValue (to implement Hashable protocol)
var hashValue: Int {
// DJB Hash Function
return self.scalarArray.reduce(5381) {
(//Swift 3.0
// This struct is an array of UInt32 to hold Unicode scalar values
struct ScalarString: Sequence, Hashable, CustomStringConvertible {
private var scalarArray: [UInt32] = []
init() {
// does anything need to go here?
}
init(_ character: UInt32) {
self.scalarArray.append(character)
}
init(_ charArray: [UInt32]) {
for c in charArray {
self.scalarArray.append(c)
}
}
init(_ string: String) {
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}
// Generator in order to conform to SequenceType protocol
// (to allow users to iterate as in `for myScalarValue in myScalarString` { ... })
//func generate() -> AnyIterator<UInt32> {
func makeIterator() -> AnyIterator<UInt32> {
let nextIndex = 0
return AnyIterator {
if (nextIndex > self.scalarArray.count-1) {
return nil
}
return self.scalarArray[nextIndex + 1]
}
}
// append
mutating func append(scalar: UInt32) {
self.scalarArray.append(scalar)
}
mutating func append(scalarString: ScalarString) {
for scalar in scalarString {
self.scalarArray.append(scalar)
}
}
mutating func append(string: String) {
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}
// charAt
func charAt(index: Int) -> UInt32 {
return self.scalarArray[index]
}
// clear
mutating func clear() {
self.scalarArray.removeAll(keepingCapacity: true)
}
// contains
func contains(character: UInt32) -> Bool {
for scalar in self.scalarArray {
if scalar == character {
return true
}
}
return false
}
// description (to implement Printable protocol)
var description: String {
var string: String = ""
for scalar in scalarArray {
string.append(String(describing: UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!)
}
return string
}
// endsWith
func endsWith() -> UInt32? {
return self.scalarArray.last
}
// insert
mutating func insert(scalar: UInt32, atIndex index: Int) {
self.scalarArray.insert(scalar, at: index)
}
// isEmpty
var isEmpty: Bool {
get {
return self.scalarArray.count == 0
}
}
// hashValue (to implement Hashable protocol)
var hashValue: Int {
get {
// DJB Hash Function
var hash = 5381
for i in 0 ..< scalarArray.count {
hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
}
/*
for i in 0..< self.scalarArray.count {
hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
}
*/
return hash
}
}
// length
var length: Int {
get {
return self.scalarArray.count
}
}
// remove character
mutating func removeCharAt(index: Int) {
self.scalarArray.remove(at: index)
}
func removingAllInstancesOfChar(character: UInt32) -> ScalarString {
var returnString = ScalarString()
for scalar in self.scalarArray {
if scalar != character {
returnString.append(scalar: scalar) //.append(scalar)
}
}
return returnString
}
// replace
func replace(character: UInt32, withChar replacementChar: UInt32) -> ScalarString {
var returnString = ScalarString()
for scalar in self.scalarArray {
if scalar == character {
returnString.append(scalar: replacementChar) //.append(replacementChar)
} else {
returnString.append(scalar: scalar) //.append(scalar)
}
}
return returnString
}
// func replace(character: UInt32, withString replacementString: String) -> ScalarString {
func replace(character: UInt32, withString replacementString: ScalarString) -> ScalarString {
var returnString = ScalarString()
for scalar in self.scalarArray {
if scalar == character {
returnString.append(scalarString: replacementString) //.append(replacementString)
} else {
returnString.append(scalar: scalar) //.append(scalar)
}
}
return returnString
}
// set (an alternative to myScalarString = "some string")
mutating func set(string: String) {
self.scalarArray.removeAll(keepingCapacity: false)
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}
// split
func split(atChar splitChar: UInt32) -> [ScalarString] {
var partsArray: [ScalarString] = []
var part: ScalarString = ScalarString()
for scalar in self.scalarArray {
if scalar == splitChar {
partsArray.append(part)
part = ScalarString()
} else {
part.append(scalar: scalar) //.append(scalar)
}
}
partsArray.append(part)
return partsArray
}
// startsWith
func startsWith() -> UInt32? {
return self.scalarArray.first
}
// substring
func substring(startIndex: Int) -> ScalarString {
// from startIndex to end of string
var subArray: ScalarString = ScalarString()
for i in startIndex ..< self.length {
subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i])
}
return subArray
}
func substring(startIndex: Int, _ endIndex: Int) -> ScalarString {
// (startIndex is inclusive, endIndex is exclusive)
var subArray: ScalarString = ScalarString()
for i in startIndex ..< endIndex {
subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i])
}
return subArray
}
// toString
func toString() -> String {
let string: String = ""
for scalar in self.scalarArray {
string.appending(String(describing:UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!)
}
return string
}
// values
func values() -> [UInt32] {
return self.scalarArray
}
}
func ==(left: ScalarString, right: ScalarString) -> Bool {
if left.length != right.length {
return false
}
for i in 0 ..< left.length {
if left.charAt(index: i) != right.charAt(index: i) {
return false
}
}
return true
}
func +(left: ScalarString, right: ScalarString) -> ScalarString {
var returnString = ScalarString()
for scalar in left.values() {
returnString.append(scalar: scalar) //.append(scalar)
}
for scalar in right.values() {
returnString.append(scalar: scalar) //.append(scalar)
}
return returnString
}
<< 5) &+ ##代码## &+ Int()
}
}
// length
var length: Int {
return self.scalarArray.count
}
// remove character
mutating func removeCharAt(_ index: Int) {
self.scalarArray.remove(at: index)
}
func removingAllInstancesOfChar(_ character: UInt32) -> ScalarString {
var returnString = ScalarString()
for scalar in self.scalarArray {
if scalar != character {
returnString.append(scalar)
}
}
return returnString
}
func removeRange(_ range: CountableRange<Int>) -> ScalarString? {
if range.lowerBound < 0 || range.upperBound > scalarArray.count {
return nil
}
var returnString = ScalarString()
for i in 0..<scalarArray.count {
if i < range.lowerBound || i >= range.upperBound {
returnString.append(scalarArray[i])
}
}
return returnString
}
// replace
func replace(_ character: UInt32, withChar replacementChar: UInt32) -> ScalarString {
var returnString = ScalarString()
for scalar in self.scalarArray {
if scalar == character {
returnString.append(replacementChar)
} else {
returnString.append(scalar)
}
}
return returnString
}
func replace(_ character: UInt32, withString replacementString: String) -> ScalarString {
var returnString = ScalarString()
for scalar in self.scalarArray {
if scalar == character {
returnString.append(replacementString)
} else {
returnString.append(scalar)
}
}
return returnString
}
func replaceRange(_ range: CountableRange<Int>, withString replacementString: ScalarString) -> ScalarString {
var returnString = ScalarString()
for i in 0..<scalarArray.count {
if i < range.lowerBound || i >= range.upperBound {
returnString.append(scalarArray[i])
} else if i == range.lowerBound {
returnString.append(replacementString)
}
}
return returnString
}
// set (an alternative to myScalarString = "some string")
mutating func set(_ string: String) {
self.scalarArray.removeAll(keepingCapacity: false)
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}
// split
func split(atChar splitChar: UInt32) -> [ScalarString] {
var partsArray: [ScalarString] = []
if self.scalarArray.count == 0 {
return partsArray
}
var part: ScalarString = ScalarString()
for scalar in self.scalarArray {
if scalar == splitChar {
partsArray.append(part)
part = ScalarString()
} else {
part.append(scalar)
}
}
partsArray.append(part)
return partsArray
}
// startsWith
func startsWith() -> UInt32? {
return self.scalarArray.first
}
// substring
func substring(_ startIndex: Int) -> ScalarString {
// from startIndex to end of string
var subArray: ScalarString = ScalarString()
for i in startIndex..<self.length {
subArray.append(self.scalarArray[i])
}
return subArray
}
func substring(_ startIndex: Int, _ endIndex: Int) -> ScalarString {
// (startIndex is inclusive, endIndex is exclusive)
var subArray: ScalarString = ScalarString()
for i in startIndex..<endIndex {
subArray.append(self.scalarArray[i])
}
return subArray
}
// toString
func toString() -> String {
var string: String = ""
for scalar in self.scalarArray {
if let validScalor = UnicodeScalar(scalar) {
string.append(Character(validScalor))
}
}
return string
}
// trim
// removes leading and trailing whitespace (space, tab, newline)
func trim() -> ScalarString {
//var returnString = ScalarString()
let space: UInt32 = 0x00000020
let tab: UInt32 = 0x00000009
let newline: UInt32 = 0x0000000A
var startIndex = self.scalarArray.count
var endIndex = 0
// leading whitespace
for i in 0..<self.scalarArray.count {
if self.scalarArray[i] != space &&
self.scalarArray[i] != tab &&
self.scalarArray[i] != newline {
startIndex = i
break
}
}
// trailing whitespace
for i in stride(from: (self.scalarArray.count - 1), through: 0, by: -1) {
if self.scalarArray[i] != space &&
self.scalarArray[i] != tab &&
self.scalarArray[i] != newline {
endIndex = i + 1
break
}
}
if endIndex <= startIndex {
return ScalarString()
}
return self.substring(startIndex, endIndex)
}
// values
func values() -> [UInt32] {
return self.scalarArray
}
}
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}
func +(left: ScalarString, right: ScalarString) -> ScalarString {
var returnString = ScalarString()
for scalar in left.values() {
returnString.append(scalar)
}
for scalar in right.values() {
returnString.append(scalar)
}
return returnString
}