ios 如何在 Swift 中为 Int 数组(自定义字符串结构)实现 Hashable 协议
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31438210/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to implement the Hashable Protocol in Swift for an Int array (a custom string struct)
提问by Suragch
I am making a structure that acts like a String
, except that it only deals with Unicode UTF-32 scalar values. Thus, it is an array of UInt32
. (See this questionfor more background.)
我正在制作一个类似于 a 的结构String
,除了它只处理 Unicode UTF-32 标量值。因此,它是一个数组UInt32
。(有关更多背景信息,请参阅此问题。)
What I want to do
我想做的事
I want to be able to use my custom ScalarString
struct as a key in a dictionary. For example:
我希望能够将我的自定义ScalarString
结构用作字典中的键。例如:
var suffixDictionary = [ScalarString: ScalarString]() // Unicode key, rendered glyph value
// populate dictionary
suffixDictionary[keyScalarString] = valueScalarString
// ...
// check if dictionary contains Unicode scalar string key
if let renderedSuffix = suffixDictionary[unicodeScalarString] {
// do something with value
}
Problem
问题
In order to do that, ScalarString
needs to implement the Hashable Protocol. I thought I would be able to do something like this:
为了做到这一点,ScalarString
需要实现Hashable 协议。我以为我可以做这样的事情:
struct ScalarString: Hashable {
private var scalarArray: [UInt32] = []
var hashValue : Int {
get {
return self.scalarArray.hashValue // error
}
}
}
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.hashValue == right.hashValue
}
but then I discovered that Swift arraysdon't have a hashValue
.
但后来我发现Swift 数组没有hashValue
.
What I read
我读到的
The article Strategies for Implementing the Hashable Protocol in Swifthad a lot of great ideas, but I didn't see any that seemed like they would work well in this case. Specifically,
文章在斯威夫特实施哈希的协议策略有很多伟大的想法,但我没有看到任何这似乎是他们会在这种情况下很好地工作。具体来说,
- Object property(array is does not have
hashValue
) - ID property(not sure how this could be implemented well)
- Formula(seems like any formula for a string of 32 bit integers would be processor heavy and have lots of integer overflow)
- ObjectIdentifier(I'm using a struct, not a class)
- Inheriting from NSObject(I'm using a struct, not a class)
- 对象属性(数组是没有的
hashValue
) - ID 属性(不确定如何很好地实现)
- 公式(似乎任何用于 32 位整数字符串的公式都会占用大量处理器并有大量整数溢出)
- ObjectIdentifier(我使用的是结构,而不是类)
- 从 NSObject 继承(我使用的是结构,而不是类)
Here are some other things I read:
以下是我阅读的其他一些内容:
- Implementing Swift's Hashable Protocol
- Swift Comparison Protocols
- Perfect hash function
- Membership of custom objects in Swift Arrays and Dictionaries
- How to implement Hashable for your custom class
- Writing a good Hashable implementation in Swift
- 实现 Swift 的 Hashable 协议
- Swift 比较协议
- 完善的哈希函数
- Swift 数组和字典中自定义对象的成员资格
- 如何为您的自定义类实现 Hashable
- 用 Swift 编写一个好的 Hashable 实现
Question
题
Swift Strings have a hashValue
property, so I know it is possible to do.
Swift Strings 有一个hashValue
属性,所以我知道这是可能的。
How would I create a hashValue
for my custom structure?
我将如何hashValue
为我的自定义结构创建一个?
Updates
更新
Update 1:I would like to do something that does not involve converting to String
and then using String
's hashValue
. My whole point for making my own structure was so that I could avoid doing lots of String
conversions. String
gets it's hashValue
from somewhere. It seems like I could get it using the same method.
更新 1:我想做一些不涉及转换为String
然后使用String
's 的事情hashValue
。我制作自己的结构的全部目的是为了避免进行大量String
转换。从某个地方String
得到它hashValue
。似乎我可以使用相同的方法获得它。
Update 2:I've been looking into the implementation of string hash codes algorithms from other contexts. I'm having a little difficulty knowing which is best and expressing them in Swift, though.
更新 2:我一直在研究其他上下文中字符串哈希码算法的实现。不过,我在知道哪个最好并在 Swift 中表达它们时遇到了一些困难。
- Java
hashCode
algorithm - C algorithms
- hash function for string(SO question and answers in C)
- Hashing tutorial(Virginia Tech Algorithm Visualization Research Group)
- General Purpose Hash Function Algorithms
- Java
hashCode
算法 - C 算法
- 字符串的哈希函数(C 中的 SO 问题和答案)
- 哈希教程(弗吉尼亚技术算法可视化研究组)
- 通用哈希函数算法
Update 3
更新 3
I would prefer not to import any external frameworks unless that is the recommended way to go for these things.
我不想导入任何外部框架,除非这是处理这些事情的推荐方式。
I submitted a possible solution using the DJB Hash Function.
我使用 DJB 哈希函数提交了一个可能的解决方案。
回答by Suragch
Update
更新
Martin R writes:
马丁 R写道:
As of Swift 4.1, the compiler can synthesize
Equatable
andHashable
for types conformance automatically, if all members conform to Equatable/Hashable (SE0185). And as of Swift 4.2, a high-quality hash combiner is built-in into the Swift standard library (SE-0206).Therefore there is no need anymore to define your own hashing function, it suffices to declare the conformance:
struct ScalarString: Hashable, ... { private var scalarArray: [UInt32] = [] // ... }
从Swift 4.1 开始, 如果所有成员都符合 Equatable/Hashable (SE0185) ,编译器可以自动合成
Equatable
和Hashable
类型一致性。从Swift 4.2 开始,Swift 标准库 (SE-0206) 中内置了一个高质量的哈希组合器。因此不再需要定义自己的散列函数,声明一致性就足够了:
struct ScalarString: Hashable, ... { private var scalarArray: [UInt32] = [] // ... }
Thus, the answer below needs to be rewritten (yet again). Until that happens refer to Martin R's answer from the link above.
因此,下面的答案需要重写(再次)。在此之前,请参阅上面链接中 Martin R 的回答。
Old Answer:
旧答案:
This answer has been completely rewritten after submitting my original answer to code review.
How to implement to Hashable protocol
如何实现到Hashable协议
The Hashable protocolallows you to use your custom class or struct as a dictionary key. In order to implement this protocol you need to
该哈希的协议允许您使用您的自定义类或结构作为字典键。为了实现这个协议,你需要
- Implement the Equatable protocol(Hashable inherits from Equatable)
- Return a computed
hashValue
- 实现Equatable 协议(Hashable 继承自 Equatable)
- 返回一个计算
hashValue
These points follow from the axiom given in the documentation:
这些要点来自文档中给出的公理:
x == y
impliesx.hashValue == y.hashValue
x == y
暗示x.hashValue == y.hashValue
where x
and y
are values of some Type.
其中x
和y
是某种类型的值。
Implement the Equatable protocol
实施 Equatable 协议
In order to implement the Equatable protocol, you define how your type uses the ==
(equivalence) operator. In your example, equivalence can be determined like this:
为了实现 Equatable 协议,您需要定义您的类型如何使用==
(等价)运算符。在您的示例中,可以这样确定等效性:
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}
The ==
function is global so it goes outside of your class or struct.
该==
函数是全局的,因此它超出了您的类或结构。
Return a computed hashValue
返回一个计算 hashValue
Your custom class or struct must also have a computed hashValue
variable. A good hash algorithm will provide a wide range of hash values. However, it should be noted that you do not need to guarantee that the hash values are all unique. When two different values have identical hash values, this is called a hash collision. It requires some extra work when there is a collision (which is why a good distribution is desirable), but some collisions are to be expected. As I understand it, the ==
function does that extra work. (Update: It looks like ==
may do allthe work.)
您的自定义类或结构还必须有一个计算hashValue
变量。一个好的散列算法将提供范围广泛的散列值。但是需要注意的是,您不需要保证哈希值都是唯一的。当两个不同的值具有相同的哈希值时,这称为哈希冲突。当发生碰撞时,它需要一些额外的工作(这就是为什么需要良好分布的原因),但是可以预期一些碰撞。据我了解,该==
函数完成了额外的工作。(更新:看起来==
可以完成所有工作。)
There are a number of ways to calculate the hash value. For example, you could do something as simple as returning the number of elements in the array.
有多种方法可以计算哈希值。例如,您可以执行一些简单的操作,例如返回数组中的元素数。
var hashValue: Int {
return self.scalarArray.count
}
This would give a hash collision every time two arrays had the same number of elements but different values. NSArray
apparently uses this approach.
每次两个数组具有相同数量的元素但不同的值时,这都会导致哈希冲突。NSArray
显然使用这种方法。
DJB Hash Function
DJB 哈希函数
A common hash function that works with strings is the DJB hash function. This is the one I will be using, but check out some others here.
处理字符串的常见散列函数是 DJB 散列函数。这是我将使用的一种,但请在此处查看其他一些。
A Swift implementation provided by @MartinRfollows:
@MartinR 提供的Swift 实现如下:
var hashValue: Int {
return self.scalarArray.reduce(5381) {
(var hashValue: Int {
// DJB Hash Function
var hash = 5381
for(var i = 0; i < self.scalarArray.count; i++)
{
hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
}
return hash
}
<< 5) &+ // Include the Hashable keyword after the class/struct name
struct ScalarString: Hashable {
private var scalarArray: [UInt32] = []
// required var for the Hashable protocol
var hashValue: Int {
// DJB hash function
return self.scalarArray.reduce(5381) {
("\(scalarArray)".hashValue
<< 5) &+ scalarArray.description.hashValue
&+ Int()
}
}
}
// required function for the Equatable protocol, which Hashable inheirits from
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}
&+ Int()
}
}
This is an improved version of my original implementation, but let me also include the older expanded form, which may be more readable for people not familiar with reduce
. This is equivalent, I believe:
这是我原始实现的改进版本,但让我也包括旧的扩展形式,对于不熟悉reduce
. 这是等效的,我相信:
#import <CommonCrypto/CommonDigest.h>
The &+
operator allows Int
to overflow and start over again for long strings.
该&+
运营商允许Int
溢出和长串重新开始。
Big Picture
大图
We have looked at the pieces, but let me now show the whole example code as it relates to the Hashable protocol. ScalarString
is the custom type from the question. This will be different for different people, of course.
我们已经看过这些片段,但现在让我展示与 Hashable 协议相关的整个示例代码。ScalarString
是问题中的自定义类型。当然,这对于不同的人来说会有所不同。
extension Array : Hashable, Equatable {
public var hashValue : Int {
var hash = [Int](count: Int(CC_SHA256_DIGEST_LENGTH) / sizeof(Int), repeatedValue: 0)
withUnsafeBufferPointer { ptr in
hash.withUnsafeMutableBufferPointer { (inout hPtr: UnsafeMutableBufferPointer<Int>) -> Void in
CC_SHA256(UnsafePointer<Void>(ptr.baseAddress), CC_LONG(count * sizeof(Element)), UnsafeMutablePointer<UInt8>(hPtr.baseAddress))
}
}
return hash[0]
}
}
Other helpful reading
其他有用的阅读
- Which hashing algorithm is best for uniqueness and speed?
- Overflow Operators
- Why are 5381 and 33 so important in the djb2 algorithm?
- How are hash collisions handled?
Credits
学分
A big thanks to Martin R over in Code Review. My rewrite is largely based on his answer. If you found this helpful, then please give him an upvote.
非常感谢 Martin R 在 Code Review 中的表现。我的重写主要基于他的回答。如果你觉得这对你有帮助,那么请给他一个upvote。
Update
更新
Swift is open source now so it is possible to see how hashValue
is implemented for String
from the source code. It appears to be more complex than the answer I have given here, and I have not taken the time to analyze it fully. Feel free to do so yourself.
Swift 现在是开源的,因此可以从源代码中了解hashValue
其实现方式。它似乎比我在这里给出的答案更复杂,我没有花时间对其进行全面分析。随意自己做。String
回答by Kametrixom
It is not a very elegant solution but it works nicely:
这不是一个非常优雅的解决方案,但效果很好:
public func ==<T>(lhs: [T], rhs: [T]) -> Bool {
return lhs.hashValue == rhs.hashValue
}
or
或者
n -> time
1000 -> 0.000037 s
10000 -> 0.000379 s
100000 -> 0.003402 s
Which just uses the textual representation as a hash source
仅使用文本表示作为哈希源
回答by Kametrixom
Edit (31 May '17): Please refer to the accepted answer. This answer is pretty much just a demonstration on how to use the CommonCrypto
Framework
编辑(2017 年 5 月 31 日):请参阅已接受的答案。这个答案几乎只是关于如何使用CommonCrypto
框架的演示
Okay, I got ahead and extended all arrays with the Hashable
protocol by using the SHA-256 hashing algorithm from the CommonCrypto framework. You have to put
好的,我Hashable
通过使用来自 CommonCrypto 框架的 SHA-256 散列算法提前使用该协议扩展了所有数组。你必须把
n -> time
1000 -> 0.001359 s
10000 -> 0.011036 s
100000 -> 0.122177 s
into your bridging header for this to work. It's a shame that pointers have to be used though:
进入您的桥接头以使其正常工作。遗憾的是,必须使用指针:
var hashValue : Int {
get {
return String(self.scalarArray.map { UnicodeScalar(##代码##) }).hashValue
}
}
Edit (31 May '17): Don't do this, even though SHA256 has pretty much no hash collisions, it's the wrong idea to define equality by hash equality
编辑(2017 年 5 月 31 日):不要这样做,即使 SHA256 几乎没有哈希冲突,但通过哈希相等来定义相等是错误的想法
##代码##This is as good as it gets with CommonCrypto
. It's ugly, but fast and not manypretty much no hash collisions for sure
这与CommonCrypto
. 这是丑陋的,但速度快,没有多少几乎没有哈希冲突肯定
Edit (15 July '15): I just made some speed tests:
编辑(15 年 7 月 15 日):我刚刚进行了一些速度测试:
Randomly filled Int
arrays of size n took on average over 1000 runs
Int
大小为 n 的随机填充数组平均运行 1000 多次
Whereas with the string hashing method:
而使用字符串散列方法:
##代码##So the SHA-256 way is about 33 times faster than the string way. I'm not saying that using a string is a very good solution, but it's the only one we can compare it to right now
所以SHA-256方式比字符串方式快约33倍。我并不是说使用字符串是一个很好的解决方案,但它是我们现在唯一可以比较的方法
回答by Aaron Rasmussen
One suggestion - since you are modeling a String
, would it work to convert your [UInt32]
array to a String
and use the String
's hashValue
? Like this:
一个建议 - 由于您正在建模 a String
,将您的[UInt32]
数组转换为 aString
并使用String
'shashValue
是否可行?像这样:
That could conveniently allow you to compare your custom struct
against String
s as well, though whether or not that is a good idea depends on what you are trying to do...
这也可以方便地让您将您的习惯struct
与String
s进行比较,尽管这是否是一个好主意取决于您要做什么......
Note also that, using this approach, instances of ScalarString
would have the same hashValue
if their String
representations were canonically equivalent, which may or may not be what you desire.
还要注意,使用这种方法,如果它们的表示在规范上是等效的,它们的实例ScalarString
将具有相同的特性,这可能是也可能不是您想要的。hashValue
String
So I suppose that if you want the hashValue
to represent a unique String
, my approach would be good. If you want the hashValue
to represent a unique sequence of UInt32
values, @Kametrixom's answer is the way to go...
所以我想如果你想hashValue
代表一个独特的String
,我的方法会很好。如果你想hashValue
代表一个唯一的UInt32
值序列,@Kametrixom 的答案就是要走的路......