ios 如何在 Swift 中为 Int 数组(自定义字符串结构)实现 Hashable 协议

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31438210/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 06:52:52  来源:igfitidea点击:

How to implement the Hashable Protocol in Swift for an Int array (a custom string struct)

iosarraysstringswifthashtable

提问by Suragch

I am making a structure that acts like a String, except that it only deals with Unicode UTF-32 scalar values. Thus, it is an array of UInt32. (See this questionfor more background.)

我正在制作一个类似于 a 的结构String,除了它只处理 Unicode UTF-32 标量值。因此,它是一个数组UInt32。(有关更多背景信息,请参阅此问题。)

What I want to do

我想做的事

I want to be able to use my custom ScalarStringstruct as a key in a dictionary. For example:

我希望能够将我的自定义ScalarString结构用作字典中的键。例如:

var suffixDictionary = [ScalarString: ScalarString]() // Unicode key, rendered glyph value

// populate dictionary
suffixDictionary[keyScalarString] = valueScalarString
// ...

// check if dictionary contains Unicode scalar string key
if let renderedSuffix = suffixDictionary[unicodeScalarString] {
    // do something with value
}

Problem

问题

In order to do that, ScalarStringneeds to implement the Hashable Protocol. I thought I would be able to do something like this:

为了做到这一点,ScalarString需要实现Hashable 协议。我以为我可以做这样的事情:

struct ScalarString: Hashable {

    private var scalarArray: [UInt32] = []

    var hashValue : Int {
        get {
            return self.scalarArray.hashValue // error
        }
    }
}

func ==(left: ScalarString, right: ScalarString) -> Bool {
    return left.hashValue == right.hashValue
}

but then I discovered that Swift arraysdon't have a hashValue.

但后来我发现Swift 数组没有hashValue.

What I read

我读到的

The article Strategies for Implementing the Hashable Protocol in Swifthad a lot of great ideas, but I didn't see any that seemed like they would work well in this case. Specifically,

文章在斯威夫特实施哈希的协议策略有很多伟大的想法,但我没有看到任何这似乎是他们会在这种情况下很好地工作。具体来说,

  • Object property(array is does not have hashValue)
  • ID property(not sure how this could be implemented well)
  • Formula(seems like any formula for a string of 32 bit integers would be processor heavy and have lots of integer overflow)
  • ObjectIdentifier(I'm using a struct, not a class)
  • Inheriting from NSObject(I'm using a struct, not a class)
  • 对象属性(数组是没有的hashValue
  • ID 属性(不确定如何很好地实现)
  • 公式(似乎任何用于 32 位整数字符串的公式都会占用大量处理器并有大量整数溢出)
  • ObjectIdentifier(我使用的是结构,而不是类)
  • 从 NSObject 继承(我使用的是结构,而不是类)

Here are some other things I read:

以下是我阅读的其他一些内容:

Question

Swift Strings have a hashValueproperty, so I know it is possible to do.

Swift Strings 有一个hashValue属性,所以我知道这是可能的。

How would I create a hashValuefor my custom structure?

我将如何hashValue为我的自定义结构创建一个?

Updates

更新

Update 1:I would like to do something that does not involve converting to Stringand then using String's hashValue. My whole point for making my own structure was so that I could avoid doing lots of Stringconversions. Stringgets it's hashValuefrom somewhere. It seems like I could get it using the same method.

更新 1:我想做一些不涉及转换为String然后使用String's 的事情hashValue。我制作自己的结构的全部目的是为了避免进行大量String转换。从某个地方String得到它hashValue。似乎我可以使用相同的方法获得它。

Update 2:I've been looking into the implementation of string hash codes algorithms from other contexts. I'm having a little difficulty knowing which is best and expressing them in Swift, though.

更新 2:我一直在研究其他上下文中字符串哈希码算法的实现。不过,我在知道哪个最好并在 Swift 中表达它们时遇到了一些困难。

Update 3

更新 3

I would prefer not to import any external frameworks unless that is the recommended way to go for these things.

我不想导入任何外部框架,除非这是处理这些事情的推荐方式。

I submitted a possible solution using the DJB Hash Function.

我使用 DJB 哈希函数提交了一个可能的解决方案。

回答by Suragch

Update

更新

Martin R writes:

马丁 R写道

As of Swift 4.1, the compiler can synthesize Equatableand Hashablefor types conformance automatically, if all members conform to Equatable/Hashable (SE0185). And as of Swift 4.2, a high-quality hash combiner is built-in into the Swift standard library (SE-0206).

Therefore there is no need anymore to define your own hashing function, it suffices to declare the conformance:

struct ScalarString: Hashable, ... {

    private var scalarArray: [UInt32] = []

    // ... }

Swift 4.1 开始, 如果所有成员都符合 Equatable/Hashable (SE0185) ,编译器可以自动合成EquatableHashable类型一致性。从Swift 4.2 开始,Swift 标准库 (SE-0206) 中内置了一个高质量的哈希组合器。

因此不再需要定义自己的散列函数,声明一致性就足够了:

struct ScalarString: Hashable, ... {

    private var scalarArray: [UInt32] = []

    // ... }

Thus, the answer below needs to be rewritten (yet again). Until that happens refer to Martin R's answer from the link above.

因此,下面的答案需要重写(再次)。在此之前,请参阅上面链接中 Martin R 的回答。



Old Answer:

旧答案:

This answer has been completely rewritten after submitting my original answer to code review.

在将我的原始答案提交给代码后,此答案已被完全重写。

How to implement to Hashable protocol

如何实现到Hashable协议

The Hashable protocolallows you to use your custom class or struct as a dictionary key. In order to implement this protocol you need to

哈希的协议允许您使用您的自定义类或结构作为字典键。为了实现这个协议,你需要

  1. Implement the Equatable protocol(Hashable inherits from Equatable)
  2. Return a computed hashValue
  1. 实现Equatable 协议(Hashable 继承自 Equatable)
  2. 返回一个计算 hashValue

These points follow from the axiom given in the documentation:

这些要点来自文档中给出的公理:

x == yimplies x.hashValue == y.hashValue

x == y暗示 x.hashValue == y.hashValue

where xand yare values of some Type.

其中xy是某种类型的值。

Implement the Equatable protocol

实施 Equatable 协议

In order to implement the Equatable protocol, you define how your type uses the ==(equivalence) operator. In your example, equivalence can be determined like this:

为了实现 Equatable 协议,您需要定义您的类型如何使用==(等价)运算符。在您的示例中,可以这样确定等效性:

func ==(left: ScalarString, right: ScalarString) -> Bool {
    return left.scalarArray == right.scalarArray
}

The ==function is global so it goes outside of your class or struct.

==函数是全局的,因此它超出了您的类或结构。

Return a computed hashValue

返回一个计算 hashValue

Your custom class or struct must also have a computed hashValuevariable. A good hash algorithm will provide a wide range of hash values. However, it should be noted that you do not need to guarantee that the hash values are all unique. When two different values have identical hash values, this is called a hash collision. It requires some extra work when there is a collision (which is why a good distribution is desirable), but some collisions are to be expected. As I understand it, the ==function does that extra work. (Update: It looks like ==may do allthe work.)

您的自定义类或结构还必须有一个计算hashValue变量。一个好的散列算法将提供范围广泛的散列值。但是需要注意的是,您不需要保证哈希值都是唯一的。当两个不同的值具有相同的哈希值时,这称为哈希冲突。当发生碰撞时,它需要一些额外的工作(这就是为什么需要良好分布的原因),但是可以预期一些碰撞。据我了解,该==函数完成了额外的工作。(更新看起来==可以完成所有工作。

There are a number of ways to calculate the hash value. For example, you could do something as simple as returning the number of elements in the array.

有多种方法可以计算哈希值。例如,您可以执行一些简单的操作,例如返回数组中的元素数。

var hashValue: Int {
    return self.scalarArray.count
} 

This would give a hash collision every time two arrays had the same number of elements but different values. NSArrayapparently uses this approach.

每次两个数组具有相同数量的元素但不同的值时,这都会导致哈希冲突。NSArray显然使用这种方法。

DJB Hash Function

DJB 哈希函数

A common hash function that works with strings is the DJB hash function. This is the one I will be using, but check out some others here.

处理字符串的常见散列函数是 DJB 散列函数。这是我将使用的一种,但请在此处查看其他一些。

A Swift implementation provided by @MartinRfollows:

@MartinR 提供的Swift 实现如下:

var hashValue: Int {
    return self.scalarArray.reduce(5381) {
        (
var hashValue: Int {

    // DJB Hash Function
    var hash = 5381

    for(var i = 0; i < self.scalarArray.count; i++)
    {
        hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
    }

    return hash
} 
<< 5) &+
// Include the Hashable keyword after the class/struct name
struct ScalarString: Hashable {

    private var scalarArray: [UInt32] = []

    // required var for the Hashable protocol
    var hashValue: Int {
        // DJB hash function
        return self.scalarArray.reduce(5381) {
            (
"\(scalarArray)".hashValue
<< 5) &+
scalarArray.description.hashValue
&+ Int() } } } // required function for the Equatable protocol, which Hashable inheirits from func ==(left: ScalarString, right: ScalarString) -> Bool { return left.scalarArray == right.scalarArray }
&+ Int() } }

This is an improved version of my original implementation, but let me also include the older expanded form, which may be more readable for people not familiar with reduce. This is equivalent, I believe:

这是我原始实现的改进版本,但让我也包括旧的扩展形式,对于不熟悉reduce. 这是等效的,我相信:

#import <CommonCrypto/CommonDigest.h>

The &+operator allows Intto overflow and start over again for long strings.

&+运营商允许Int溢出和长串重新开始。

Big Picture

大图

We have looked at the pieces, but let me now show the whole example code as it relates to the Hashable protocol. ScalarStringis the custom type from the question. This will be different for different people, of course.

我们已经看过这些片段,但现在让我展示与 Hashable 协议相关的整个示例代码。ScalarString是问题中的自定义类型。当然,这对于不同的人来说会有所不同。

extension Array : Hashable, Equatable {
    public var hashValue : Int {
        var hash = [Int](count: Int(CC_SHA256_DIGEST_LENGTH) / sizeof(Int), repeatedValue: 0)
        withUnsafeBufferPointer { ptr in
            hash.withUnsafeMutableBufferPointer { (inout hPtr: UnsafeMutableBufferPointer<Int>) -> Void in
                CC_SHA256(UnsafePointer<Void>(ptr.baseAddress), CC_LONG(count * sizeof(Element)), UnsafeMutablePointer<UInt8>(hPtr.baseAddress))
            }
        }

        return hash[0]
    }
}

Other helpful reading

其他有用的阅读

Credits

学分

A big thanks to Martin R over in Code Review. My rewrite is largely based on his answer. If you found this helpful, then please give him an upvote.

非常感谢 Martin R 在 Code Review 中的表现。我的重写主要基于他的回答。如果你觉得这对你有帮助,那么请给他一个upvote。

Update

更新

Swift is open source now so it is possible to see how hashValueis implemented for Stringfrom the source code. It appears to be more complex than the answer I have given here, and I have not taken the time to analyze it fully. Feel free to do so yourself.

Swift 现在是开源的,因此可以从源代码中了解hashValue其实现方式。它似乎比我在这里给出的答案更复杂,我没有花时间对其进行全面分析。随意自己做。String

回答by Kametrixom

It is not a very elegant solution but it works nicely:

这不是一个非常优雅的解决方案,但效果很好:

public func ==<T>(lhs: [T], rhs: [T]) -> Bool {
    return lhs.hashValue == rhs.hashValue
}

or

或者

n      -> time
1000   -> 0.000037 s
10000  -> 0.000379 s
100000 -> 0.003402 s

Which just uses the textual representation as a hash source

仅使用文本表示作为哈希源

回答by Kametrixom

Edit (31 May '17): Please refer to the accepted answer. This answer is pretty much just a demonstration on how to use the CommonCryptoFramework

编辑(2017 年 5 月 31 日):请参阅已接受的答案。这个答案几乎只是关于如何使用CommonCrypto框架的演示

Okay, I got ahead and extended all arrays with the Hashableprotocol by using the SHA-256 hashing algorithm from the CommonCrypto framework. You have to put

好的,我Hashable通过使用来自 CommonCrypto 框架的 SHA-256 散列算法提前使用该协议扩展了所有数组。你必须把

n      -> time
1000   -> 0.001359 s
10000  -> 0.011036 s
100000 -> 0.122177 s

into your bridging header for this to work. It's a shame that pointers have to be used though:

进入您的桥接头以使其正常工作。遗憾的是,必须使用指针:

var hashValue : Int {
    get {
        return String(self.scalarArray.map { UnicodeScalar(##代码##) }).hashValue
    }
}

Edit (31 May '17): Don't do this, even though SHA256 has pretty much no hash collisions, it's the wrong idea to define equality by hash equality

编辑(2017 年 5 月 31 日):不要这样做,即使 SHA256 几乎没有哈希冲突,但通过哈希相等来定义相等是错误的想法

##代码##

This is as good as it gets with CommonCrypto. It's ugly, but fast and not manypretty much no hash collisions for sure

这与CommonCrypto. 这是丑陋的,但速度快,没有多少几乎没有哈希冲突肯定

Edit (15 July '15): I just made some speed tests:

编辑(15 年 7 月 15 日):我刚刚进行了一些速度测试:

Randomly filled Intarrays of size n took on average over 1000 runs

Int大小为 n 的随机填充数组平均运行 1000 多次

##代码##

Whereas with the string hashing method:

而使用字符串散列方法:

##代码##

So the SHA-256 way is about 33 times faster than the string way. I'm not saying that using a string is a very good solution, but it's the only one we can compare it to right now

所以SHA-256方式比字符串方式快约33倍。我并不是说使用字符串是一个很好的解决方案,但它是我们现在唯一可以比较的方法

回答by Aaron Rasmussen

One suggestion - since you are modeling a String, would it work to convert your [UInt32]array to a Stringand use the String's hashValue? Like this:

一个建议 - 由于您正在建模 a String,将您的[UInt32]数组转换为 aString并使用String'shashValue是否可行?像这样:

##代码##

That could conveniently allow you to compare your custom structagainst Strings as well, though whether or not that is a good idea depends on what you are trying to do...

这也可以方便地让您将您的习惯structStrings进行比较,尽管这是否是一个好主意取决于您要做什么......

Note also that, using this approach, instances of ScalarStringwould have the same hashValueif their Stringrepresentations were canonically equivalent, which may or may not be what you desire.

还要注意,使用这种方法,如果它们的表示在规范上是等效的,它们的实例ScalarString将具有相同的特性,这可能是也可能不是您想要的。hashValueString

So I suppose that if you want the hashValueto represent a unique String, my approach would be good. If you want the hashValueto represent a unique sequence of UInt32values, @Kametrixom's answer is the way to go...

所以我想如果你想hashValue代表一个独特的String,我的方法会很好。如果你想hashValue代表一个唯一的UInt32值序列,@Kametrixom 的答案就是要走的路......