ios 在 Swift 中使用 Unicode 代码点

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31272561/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 06:46:51  来源:igfitidea点击:

Working with Unicode code points in Swift

iosswiftstringunicodemongolian-vertical-script

提问by Suragch

If you are not interested in the details of Mongolian but just want a quick answer about using and converting Unicode values in Swift, then skip down to the first part of the accepted answer.

如果您对蒙古语的详细信息不感兴趣,而只想快速了解如何在 Swift 中使用和转换 Unicode 值,请跳至已接受答案的第一部分。



Background

背景

I want to render Unicode text for traditional Mongolian to be used in iOS apps. The better and long term solution is to use an AAT smart fontthat would render this complex script. (Such fonts do existbut their license does not allow modification and non-personal use.) However, since I have never made a font, let alone all of the rendering logic for an AAT font, I just plan to do the rendering myself in Swift for now. Maybe at some later date I can learn to make a smart font.

我想为要在 iOS 应用程序中使用的传统蒙古语呈现 Unicode 文本。更好的长期解决方案是使用AAT 智能字体来呈现这个复杂的脚本。(这样的字体确实存在,但他们的许可证不允许修改和非个人使用。)但是,由于我从未制作过字体,更不用说AAT字体的所有渲染逻辑,我只是​​打算自己在斯威夫特现在。也许以后我可以学习制作智能字体。

Externally I will use Unicode text, but internally (for display in a UITextView) I will convert the Unicode to individual glyphs that are stored in a dumb font (coded with Unicode PUAvalues). So my rendering engine needs to convert Mongolian Unicode values (range: U+1820 to U+1842) to glyph values stored in the PUA (range: U+E360 to U+E5CF). Anyway, this is my plan since it is what I did in Java in the past, but maybe I need to change my whole way of thinking.

在外部,我将使用 Unicode 文本,但在内部(用于在 中显示UITextView),我会将 Unicode 转换为存储在哑字体(使用 Unicode PUA值编码)中的单个字形。所以我的渲染引擎需要将蒙古语 Unicode 值(范围:U+1820 到 U+1842)转换为存储在 PUA 中的字形值(范围:U+E360 到 U+E5CF)。无论如何,这是我的计划,因为这是我过去在 Java 中所做的,但也许我需要改变我的整个思维方式。

Example

例子

The following image shows suwritten twice in Mongolian using two different forms for the letter u(in red). (Mongolian is written vertically with letters being connected like cursive letters in English.)

下图显示了su用蒙古语写了两次,使用了字母u 的两种不同形式(红色)。(蒙古文是竖写的,字母像英文中的草书一样连接起来。)

enter image description here

在此处输入图片说明

In Unicode these two strings would be expressed as

在 Unicode 中,这两个字符串将表示为

var suForm1: String = "\u{1830}\u{1826}"
var suForm2: String = "\u{1830}\u{1826}\u{180B}"

The Free Variation Selector (U+180B) in suForm2is recognized (correctly) by Swift Stringto be a unit with the u(U+1826) that precedes it. It is considered by Swift to be a single character, an extended grapheme cluster. However, for the purposes of doing the rendering myself, I need to differentiate u(U+1826) and FVS1 (U+180B) as two distinct UTF-16 code points.

自由变体选择器 (U+180B) insuForm2被 Swift(正确)识别String为一个单位,前面有u(U+1826)。Swift 认为它是单个字符,一个扩展的字素簇。但是,为了自己进行渲染,我需要将u(U+1826) 和 FVS1 (U+180B) 区分为两个不同的 UTF-16 代码点。

For internal display purposes, I would convert the above Unicode strings to the following rendered glyph strings:

出于内部显示目的,我会将上述 Unicode 字符串转换为以下呈现的字形字符串:

suForm1 = "\u{E46F}\u{E3BA}" 
suForm2 = "\u{E46F}\u{E3BB}"

Question

I have been playing around with Swift Stringand Character. There are a lot of convenient things about them, but since in my particular case I deal exclusively with UTF-16 code units, I wonder if I should be using the old NSStringrather than Swift's String. I realize that I can use String.utf16to get UTF-16 code points, but the conversion back to Stringisn't very nice.

我一直在玩 SwiftStringCharacter. 它们有很多方便的地方,但由于在我的特殊情况下我专门处理 UTF-16 代码单元,我想知道我是否应该使用旧的NSString而不是 Swift 的String. 我意识到,我可以用String.utf16得到UTF-16码点,但转换回String是不是很好

Would it be better to stick with Stringand Characteror should I use NSStringand unichar?

坚持使用StringandCharacter还是我应该使用NSStringand会更好unichar吗?

What I have read

我读过的

Updates to this question have been hidden in order to clean the page up. See the edit history.

为了清理页面,已隐藏对此问题的更新。查看编辑历史。

回答by Suragch

Updated for Swift 3

为 Swift 3 更新

String and Character

字符串和字符

For almost everyone in the future who visits this question, Stringand Characterwill be the answer for you.

对于未来访问此问题的几乎每个人,String并将Character为您提供答案。

Set Unicode values directly in code:

直接在代码中设置 Unicode 值:

var str: String = "I want to visit 北京, Москва, ?????, ???????, and ???. "
var character: Character = ""

Use hexadecimal to set values

使用十六进制设置值

var str: String = "\u{61}\u{5927}\u{1F34E}\u{3C0}" // a大π
var character: Character = "\u{65}\u{301}" // é = "e" + accent mark

Note that the Swift Character can be composed of multiple Unicode code points, but appears to be a single character. This is called an Extended Grapheme Cluster.

请注意,Swift 字符可以由多个 Unicode 代码点组成,但看起来是单个字符。这称为扩展字素簇。

See this questionalso.

也见这个问题

Convert to Unicode values:

转换为 Unicode 值:

str.utf8
str.utf16
str.unicodeScalars // UTF-32

String(character).utf8
String(character).utf16
String(character).unicodeScalars

Convert from Unicode hex values:

从 Unicode 十六进制值转换:

let hexValue: UInt32 = 0x1F34E

// convert hex value to UnicodeScalar
guard let scalarValue = UnicodeScalar(hexValue) else {
    // early exit if hex does not form a valid unicode value
    return
}

// convert UnicodeScalar to String
let myString = String(scalarValue) // 

Or alternatively:

或者:

let hexValue: UInt32 = 0x1F34E
if let scalarValue = UnicodeScalar(hexValue) {
    let myString = String(scalarValue)
}

A few more examples

再举几个例子

let value0: UInt8 = 0x61
let value1: UInt16 = 0x5927
let value2: UInt32 = 0x1F34E

let string0 = String(UnicodeScalar(value0)) // a
let string1 = String(UnicodeScalar(value1)) // 大
let string2 = String(UnicodeScalar(value2)) // 

// convert hex array to String
let myHexArray = [0x43, 0x61, 0x74, 0x203C, 0x1F431] // an Int array
var myString = ""
for hexValue in myHexArray {
    myString.append(UnicodeScalar(hexValue))
}
print(myString) // Cat?

Note that for UTF-8 and UTF-16 the conversion is not always this easy. (See UTF-8, UTF-16, and UTF-32questions.)

请注意,对于 UTF-8 和 UTF-16,转换并不总是那么容易。(请参阅UTF-8UTF-16UTF-32问题。)

NSString and unichar

NSString 和 unichar

It is also possible to work with NSStringand unicharin Swift, but you should realize that unless you are familiar with Objective C and good at converting the syntax to Swift, it will be difficult to find good documentation.

也可以使用SwiftNSStringunichar在 Swift 中工作,但您应该意识到,除非您熟悉 Objective C 并且擅长将语法转换为 Swift,否则很难找到好的文档。

Also, unicharis a UInt16array and as mentioned above the conversion from UInt16to Unicode scalar values is not always easy (i.e., converting surrogate pairs for things like emoji and other characters in the upper code planes).

此外,unichar是一个UInt16数组,如上所述,从UInt16Unicode 标量值转换为 Unicode 标量值并不总是那么容易(即,为上层代码平面中的表情符号和其他字符等内容转换代理项对)。

Custom string structure

自定义字符串结构

For the reasons mentioned in the question, I ended up not using any of the above methods. Instead I wrote my own string structure, which was basically an array of UInt32to hold Unicode scalar values.

由于问题中提到的原因,我最终没有使用上述任何方法。相反,我编写了自己的字符串结构,它基本上是一个UInt32用于保存 Unicode 标量值的数组。

Again, this is not the solution for most people. First consider using extensionsif you only need to extend the functionality of Stringor Charactera little.

同样,这不是大多数人的解决方案。如果您只需要扩展或一点点的功能,请首先考虑使用扩展StringCharacter

But if you really need to work exclusively with Unicode scalar values, you could write a custom struct.

但是,如果您确实需要专门处理 Unicode 标量值,则可以编写自定义结构。

The advantages are:

优点是:

  • Don't need to constantly switch between Types (String, Character, UnicodeScalar, UInt32, etc.) when doing string manipulation.
  • After Unicode manipulation is finished, the final conversion to Stringis easy.
  • Easy to add more methods when they are needed
  • Simplifies converting code from Java or other languages
  • 进行字符串操作时String,不需要经常在类型(CharacterUnicodeScalarUInt32、 等)之间切换。
  • Unicode 操作完成后,最终转换String为 就很容易了。
  • 需要时轻松添加更多方法
  • 简化从 Java 或其他语言转换代码

Disadavantages are:

缺点是:

  • makes code less portable and less readable for other Swift developers
  • not as well tested and optimized as the native Swift types
  • it is yet another file that has to be included in a project every time you need it
  • 使其他 Swift 开发人员的代码可移植性和可读性降低
  • 不如原生 Swift 类型经过良好的测试和优化
  • 它是另一个文件,每次需要时都必须包含在项目中

You can make your own, but here is mine for reference. The hardest part was making it Hashable.

你可以自己做,但这里是我的供参考。最难的部分是让它成为 Hashable

// This struct is an array of UInt32 to hold Unicode scalar values
// Version 3.4.0 (Swift 3 update)


struct ScalarString: Sequence, Hashable, CustomStringConvertible {

    fileprivate var scalarArray: [UInt32] = []


    init() {
        // does anything need to go here?
    }

    init(_ character: UInt32) {
        self.scalarArray.append(character)
    }

    init(_ charArray: [UInt32]) {
        for c in charArray {
            self.scalarArray.append(c)
        }
    }

    init(_ string: String) {

        for s in string.unicodeScalars {
            self.scalarArray.append(s.value)
        }
    }

    // Generator in order to conform to SequenceType protocol
    // (to allow users to iterate as in `for myScalarValue in myScalarString` { ... })
    func makeIterator() -> AnyIterator<UInt32> {
        return AnyIterator(scalarArray.makeIterator())
    }

    // append
    mutating func append(_ scalar: UInt32) {
        self.scalarArray.append(scalar)
    }

    mutating func append(_ scalarString: ScalarString) {
        for scalar in scalarString {
            self.scalarArray.append(scalar)
        }
    }

    mutating func append(_ string: String) {
        for s in string.unicodeScalars {
            self.scalarArray.append(s.value)
        }
    }

    // charAt
    func charAt(_ index: Int) -> UInt32 {
        return self.scalarArray[index]
    }

    // clear
    mutating func clear() {
        self.scalarArray.removeAll(keepingCapacity: true)
    }

    // contains
    func contains(_ character: UInt32) -> Bool {
        for scalar in self.scalarArray {
            if scalar == character {
                return true
            }
        }
        return false
    }

    // description (to implement Printable protocol)
    var description: String {
        return self.toString()
    }

    // endsWith
    func endsWith() -> UInt32? {
        return self.scalarArray.last
    }

    // indexOf
    // returns first index of scalar string match
    func indexOf(_ string: ScalarString) -> Int? {

        if scalarArray.count < string.length {
            return nil
        }

        for i in 0...(scalarArray.count - string.length) {

            for j in 0..<string.length {

                if string.charAt(j) != scalarArray[i + j] {
                    break // substring mismatch
                }
                if j == string.length - 1 {
                    return i
                }
            }
        }

        return nil
    }

    // insert
    mutating func insert(_ scalar: UInt32, atIndex index: Int) {
        self.scalarArray.insert(scalar, at: index)
    }
    mutating func insert(_ string: ScalarString, atIndex index: Int) {
        var newIndex = index
        for scalar in string {
            self.scalarArray.insert(scalar, at: newIndex)
            newIndex += 1
        }
    }
    mutating func insert(_ string: String, atIndex index: Int) {
        var newIndex = index
        for scalar in string.unicodeScalars {
            self.scalarArray.insert(scalar.value, at: newIndex)
            newIndex += 1
        }
    }

    // isEmpty
    var isEmpty: Bool {
        return self.scalarArray.count == 0
    }

    // hashValue (to implement Hashable protocol)
    var hashValue: Int {

        // DJB Hash Function
        return self.scalarArray.reduce(5381) {
            (
//Swift 3.0  
// This struct is an array of UInt32 to hold Unicode scalar values
struct ScalarString: Sequence, Hashable, CustomStringConvertible {

    private var scalarArray: [UInt32] = []

    init() {
        // does anything need to go here?
    }

    init(_ character: UInt32) {
        self.scalarArray.append(character)
    }

    init(_ charArray: [UInt32]) {
        for c in charArray {
            self.scalarArray.append(c)
        }
    }

    init(_ string: String) {

        for s in string.unicodeScalars {
            self.scalarArray.append(s.value)
        }
    }

    // Generator in order to conform to SequenceType protocol
    // (to allow users to iterate as in `for myScalarValue in myScalarString` { ... })

    //func generate() -> AnyIterator<UInt32> {
    func makeIterator() -> AnyIterator<UInt32> {

        let nextIndex = 0

        return AnyIterator {
            if (nextIndex > self.scalarArray.count-1) {
                return nil
            }
            return self.scalarArray[nextIndex + 1]
        }
    }

    // append
    mutating func append(scalar: UInt32) {
        self.scalarArray.append(scalar)
    }

    mutating func append(scalarString: ScalarString) {
        for scalar in scalarString {
            self.scalarArray.append(scalar)
        }
    }

    mutating func append(string: String) {
        for s in string.unicodeScalars {
            self.scalarArray.append(s.value)
        }
    }

    // charAt
    func charAt(index: Int) -> UInt32 {
        return self.scalarArray[index]
    }

    // clear
    mutating func clear() {
        self.scalarArray.removeAll(keepingCapacity: true)
    }

    // contains
    func contains(character: UInt32) -> Bool {
        for scalar in self.scalarArray {
            if scalar == character {
                return true
            }
        }
        return false
    }

    // description (to implement Printable protocol)
    var description: String {

        var string: String = ""

        for scalar in scalarArray {
            string.append(String(describing: UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!)
        }
        return string
    }

    // endsWith
    func endsWith() -> UInt32? {
        return self.scalarArray.last
    }

    // insert
    mutating func insert(scalar: UInt32, atIndex index: Int) {
        self.scalarArray.insert(scalar, at: index)
    }

    // isEmpty
    var isEmpty: Bool {
        get {
            return self.scalarArray.count == 0
        }
    }

    // hashValue (to implement Hashable protocol)
    var hashValue: Int {
        get {

            // DJB Hash Function
            var hash = 5381

            for i in 0 ..< scalarArray.count {
                hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
            }
            /*
             for i in 0..< self.scalarArray.count {
             hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
             }
             */
            return hash
        }
    }

    // length
    var length: Int {
        get {
            return self.scalarArray.count
        }
    }

    // remove character
    mutating func removeCharAt(index: Int) {
        self.scalarArray.remove(at: index)
    }
    func removingAllInstancesOfChar(character: UInt32) -> ScalarString {

        var returnString = ScalarString()

        for scalar in self.scalarArray {
            if scalar != character {
                returnString.append(scalar: scalar) //.append(scalar)
            }
        }

        return returnString
    }

    // replace
    func replace(character: UInt32, withChar replacementChar: UInt32) -> ScalarString {

        var returnString = ScalarString()

        for scalar in self.scalarArray {
            if scalar == character {
                returnString.append(scalar: replacementChar) //.append(replacementChar)
            } else {
                returnString.append(scalar: scalar) //.append(scalar)
            }
        }
        return returnString
    }

    // func replace(character: UInt32, withString replacementString: String) -> ScalarString {
    func replace(character: UInt32, withString replacementString: ScalarString) -> ScalarString {

        var returnString = ScalarString()

        for scalar in self.scalarArray {
            if scalar == character {
                returnString.append(scalarString: replacementString) //.append(replacementString)
            } else {
                returnString.append(scalar: scalar) //.append(scalar)
            }
        }
        return returnString
    }

    // set (an alternative to myScalarString = "some string")
    mutating func set(string: String) {
        self.scalarArray.removeAll(keepingCapacity: false)
        for s in string.unicodeScalars {
            self.scalarArray.append(s.value)
        }
    }

    // split
    func split(atChar splitChar: UInt32) -> [ScalarString] {
        var partsArray: [ScalarString] = []
        var part: ScalarString = ScalarString()
        for scalar in self.scalarArray {
            if scalar == splitChar {
                partsArray.append(part)
                part = ScalarString()
            } else {
                part.append(scalar: scalar) //.append(scalar)
            }
        }
        partsArray.append(part)
        return partsArray
    }

    // startsWith
    func startsWith() -> UInt32? {
        return self.scalarArray.first
    }

    // substring
    func substring(startIndex: Int) -> ScalarString {
        // from startIndex to end of string
        var subArray: ScalarString = ScalarString()
        for i in startIndex ..< self.length {
            subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i])
        }
        return subArray
    }
    func substring(startIndex: Int, _ endIndex: Int) -> ScalarString {
        // (startIndex is inclusive, endIndex is exclusive)
        var subArray: ScalarString = ScalarString()
        for i in startIndex ..< endIndex {
            subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i])
        }
        return subArray
    }

    // toString
    func toString() -> String {
        let string: String = ""

        for scalar in self.scalarArray {
            string.appending(String(describing:UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!)
        }
        return string
    }

    // values
    func values() -> [UInt32] {
        return self.scalarArray
    }

}

func ==(left: ScalarString, right: ScalarString) -> Bool {

    if left.length != right.length {
        return false
    }

    for i in 0 ..< left.length {
        if left.charAt(index: i) != right.charAt(index: i) {
            return false
        }
    }

    return true
}

func +(left: ScalarString, right: ScalarString) -> ScalarString {
    var returnString = ScalarString()
    for scalar in left.values() {
        returnString.append(scalar: scalar) //.append(scalar)
    }
    for scalar in right.values() {
        returnString.append(scalar: scalar) //.append(scalar)
    }
    return returnString
}
<< 5) &+ ##代码## &+ Int() } } // length var length: Int { return self.scalarArray.count } // remove character mutating func removeCharAt(_ index: Int) { self.scalarArray.remove(at: index) } func removingAllInstancesOfChar(_ character: UInt32) -> ScalarString { var returnString = ScalarString() for scalar in self.scalarArray { if scalar != character { returnString.append(scalar) } } return returnString } func removeRange(_ range: CountableRange<Int>) -> ScalarString? { if range.lowerBound < 0 || range.upperBound > scalarArray.count { return nil } var returnString = ScalarString() for i in 0..<scalarArray.count { if i < range.lowerBound || i >= range.upperBound { returnString.append(scalarArray[i]) } } return returnString } // replace func replace(_ character: UInt32, withChar replacementChar: UInt32) -> ScalarString { var returnString = ScalarString() for scalar in self.scalarArray { if scalar == character { returnString.append(replacementChar) } else { returnString.append(scalar) } } return returnString } func replace(_ character: UInt32, withString replacementString: String) -> ScalarString { var returnString = ScalarString() for scalar in self.scalarArray { if scalar == character { returnString.append(replacementString) } else { returnString.append(scalar) } } return returnString } func replaceRange(_ range: CountableRange<Int>, withString replacementString: ScalarString) -> ScalarString { var returnString = ScalarString() for i in 0..<scalarArray.count { if i < range.lowerBound || i >= range.upperBound { returnString.append(scalarArray[i]) } else if i == range.lowerBound { returnString.append(replacementString) } } return returnString } // set (an alternative to myScalarString = "some string") mutating func set(_ string: String) { self.scalarArray.removeAll(keepingCapacity: false) for s in string.unicodeScalars { self.scalarArray.append(s.value) } } // split func split(atChar splitChar: UInt32) -> [ScalarString] { var partsArray: [ScalarString] = [] if self.scalarArray.count == 0 { return partsArray } var part: ScalarString = ScalarString() for scalar in self.scalarArray { if scalar == splitChar { partsArray.append(part) part = ScalarString() } else { part.append(scalar) } } partsArray.append(part) return partsArray } // startsWith func startsWith() -> UInt32? { return self.scalarArray.first } // substring func substring(_ startIndex: Int) -> ScalarString { // from startIndex to end of string var subArray: ScalarString = ScalarString() for i in startIndex..<self.length { subArray.append(self.scalarArray[i]) } return subArray } func substring(_ startIndex: Int, _ endIndex: Int) -> ScalarString { // (startIndex is inclusive, endIndex is exclusive) var subArray: ScalarString = ScalarString() for i in startIndex..<endIndex { subArray.append(self.scalarArray[i]) } return subArray } // toString func toString() -> String { var string: String = "" for scalar in self.scalarArray { if let validScalor = UnicodeScalar(scalar) { string.append(Character(validScalor)) } } return string } // trim // removes leading and trailing whitespace (space, tab, newline) func trim() -> ScalarString { //var returnString = ScalarString() let space: UInt32 = 0x00000020 let tab: UInt32 = 0x00000009 let newline: UInt32 = 0x0000000A var startIndex = self.scalarArray.count var endIndex = 0 // leading whitespace for i in 0..<self.scalarArray.count { if self.scalarArray[i] != space && self.scalarArray[i] != tab && self.scalarArray[i] != newline { startIndex = i break } } // trailing whitespace for i in stride(from: (self.scalarArray.count - 1), through: 0, by: -1) { if self.scalarArray[i] != space && self.scalarArray[i] != tab && self.scalarArray[i] != newline { endIndex = i + 1 break } } if endIndex <= startIndex { return ScalarString() } return self.substring(startIndex, endIndex) } // values func values() -> [UInt32] { return self.scalarArray } } func ==(left: ScalarString, right: ScalarString) -> Bool { return left.scalarArray == right.scalarArray } func +(left: ScalarString, right: ScalarString) -> ScalarString { var returnString = ScalarString() for scalar in left.values() { returnString.append(scalar) } for scalar in right.values() { returnString.append(scalar) } return returnString }

回答by Marijan V.

##代码##