哈希在python中有什么作用?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17585730/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What does hash do in python?
提问by Roman
I saw an example of code that where hash
function is applied to tuple. As a result it returns a negative integer. I wonder what does this function does. Google does not help. I found a page that explains how hash is calculated but it does not explain why we need this function.
我看到了一个将hash
函数应用于元组的代码示例。结果它返回一个负整数。我想知道这个函数有什么作用。谷歌没有帮助。我找到了一个页面,解释了哈希是如何计算的,但没有解释为什么我们需要这个函数。
采纳答案by Lennart Regebro
A hash is an fixed sized integer that identifies a particular value. Each value needs to have its own hash, so for the same value you will get the same hash even if it's not the same object.
散列是一个固定大小的整数,用于标识特定值。每个值都需要有自己的哈希值,因此对于相同的值,即使它不是同一个对象,您也会获得相同的哈希值。
>>> hash("Look at me!")
4343814758193556824
>>> f = "Look at me!"
>>> hash(f)
4343814758193556824
Hash values need to be created in such a way that the resulting values are evenly distributed to reduce the number of hash collisions you get. Hash collisions are when two different values have the same hash. Therefore, relatively small changes often result in very different hashes.
散列值需要以这样一种方式创建,即结果值均匀分布,以减少您获得的散列冲突的数量。哈希冲突是指两个不同的值具有相同的哈希值。因此,相对较小的变化通常会导致非常不同的哈希值。
>>> hash("Look at me!!")
6941904779894686356
These numbers are very useful, as they enable quick look-up of values in a large collection of values. Two examples of their use are Python's set
and dict
. In a list
, if you want to check if a value is in the list, with if x in values:
, Python needs to go through the whole list and compare x
with each value in the list values
. This can take a long time for a long list
. In a set
, Python keeps track of each hash, and when you type if x in values:
, Python will get the hash-value for x
, look that up in an internal structure and then only compare x
with the values that have the same hash as x
.
这些数字非常有用,因为它们可以在大量值中快速查找值。它们的两个使用示例是 Pythonset
和dict
. 在 a 中list
,如果要检查某个值是否在列表中, with if x in values:
,Python 需要遍历整个列表并x
与列表中的每个值进行比较values
。这可能需要很长时间list
。在 a 中set
,Python 会跟踪每个散列,当您键入 时if x in values:
,Python 将获取 的散列值x
,在内部结构中查找它,然后仅与x
与 具有相同散列的值进行比较x
。
The same methodology is used for dictionary lookup. This makes lookup in set
and dict
very fast, while lookup in list
is slow. It also means you can have non-hashable objects in a list
, but not in a set
or as keys in a dict
. The typical example of non-hashable objects is any object that is mutable, meaning that you can change its value. If you have a mutable object it should not be hashable, as its hash then will change over its life-time, which would cause a lot of confusion, as an object could end up under the wrong hash value in a dictionary.
相同的方法用于字典查找。这使得查找中set
和dict
速度非常快,而在查找list
缓慢。这也意味着您可以在 a 中拥有不可散列的对象list
,但不能在 a 中set
或作为a中的键dict
。不可散列对象的典型示例是任何可变对象,这意味着您可以更改其值。如果您有一个可变对象,它不应该是可散列的,因为它的散列会在其生命周期内发生变化,这会引起很多混乱,因为对象可能会在字典中使用错误的散列值。
Note that the hash of a value only needs to be the same for one run of Python. In Python 3.3 they will in fact change for every new run of Python:
请注意,对于一次 Python 运行,值的哈希值只需相同。在 Python 3.3 中,它们实际上会随着 Python 的每次新运行而改变:
$ /opt/python33/bin/python3
Python 3.3.2 (default, Jun 17 2013, 17:49:21)
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> hash("foo")
1849024199686380661
>>>
$ /opt/python33/bin/python3
Python 3.3.2 (default, Jun 17 2013, 17:49:21)
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> hash("foo")
-7416743951976404299
This is to make is harder to guess what hash value a certain string will have, which is an important security feature for web applications etc.
这是为了更难猜测某个字符串将具有什么哈希值,这是 Web 应用程序等的重要安全功能。
Hash values should therefore not be stored permanently. If you need to use hash values in a permanent way you can take a look at the more "serious" types of hashes, cryptographic hash functions, that can be used for making verifiable checksums of files etc.
因此,不应永久存储散列值。如果您需要永久使用散列值,您可以查看更“严重”的散列类型、加密散列函数,它们可用于制作可验证的文件校验和等。
回答by Jonathon Reinhart
The Python docs for hash()
state:
状态的 Python文档hash()
:
Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup.
哈希值是整数。它们用于在字典查找期间快速比较字典键。
Python dictionaries are implemented as hash tables. So any time you use a dictionary, hash()
is called on the keys that you pass in for assignment, or look-up.
Python 字典被实现为哈希表。因此,每当您使用字典时,hash()
都会在您传入的键上调用以进行分配或查找。
Additionally, the docs for the dict
typestate:
Values that are not hashable, that is, values containing lists, dictionaries or other mutable types (that are compared by value rather than by object identity) may not be used as keys.
不可散列的值,即包含列表、字典或其他可变类型(按值而不是按对象标识进行比较)的值不能用作键。
回答by NPE
The hash is used by dictionaries and sets to quickly look up the object. A good starting point is Wikipedia's article on hash tables.
字典使用哈希值并设置为快速查找对象。一个很好的起点是维基百科关于哈希表的文章。
回答by dnozay
TL;DR:
特尔;博士:
Please refer to the glossary: hash()
is used as a shortcut to comparing objects, an object is deemed hashable if it can be compared to other objects. that is why we use hash()
. It's also used to access dict
and set
elements which are implemented as resizable hash tables in CPython.
请参阅词汇表:hash()
用作比较对象的快捷方式,如果一个对象可以与其他对象进行比较,则该对象被认为是可散列的。这就是为什么我们使用hash()
. 它还用于访问dict
和set
元素,这些元素在 CPython中实现为可调整大小的哈希表。
Technical considerations
技术考虑
- usually comparing objects (which may involve several levels of recursion) is expensive.
- preferably, the
hash()
function is an order of magnitude (or several) less expensive. - comparing two hashes is easier than comparing two objects, this is where the shortcut is.
- 通常比较对象(可能涉及多个级别的递归)是昂贵的。
- 优选地,该
hash()
函数的成本低一个数量级(或几个)。 - 比较两个散列比比较两个对象更容易,这就是捷径所在。
If you read about how dictionaries are implemented, they use hash tables, which means deriving a key from an object is a corner stone for retrieving objects in dictionaries in O(1)
. That's however very dependent on your hash function to be collision-resistant. The worst case for getting an itemin a dictionary is actually O(n)
.
如果你读过字典是如何实现的,它们使用哈希表,这意味着从对象派生一个键是检索字典中对象的基石O(1)
。然而,这非常依赖于您的哈希函数是否具有抗碰撞性。在字典中获取项目的最坏情况实际上是O(n)
.
On that note, mutable objects are usually not hashable. The hashable property means you can use an object as a key. If the hash value is used as a key and the contents of that same object change, then what should the hash function return? Is it the same key or a different one? It dependson how you define your hash function.
在这一点上,可变对象通常是不可散列的。hashable 属性意味着您可以使用对象作为键。如果将哈希值用作键并且同一对象的内容发生变化,那么哈希函数应该返回什么?它是相同的钥匙还是不同的钥匙?这取决于您如何定义哈希函数。
Learning by example:
实例学习:
Imagine we have this class:
想象一下我们有这个类:
>>> class Person(object):
... def __init__(self, name, ssn, address):
... self.name = name
... self.ssn = ssn
... self.address = address
... def __hash__(self):
... return hash(self.ssn)
... def __eq__(self, other):
... return self.ssn == other.ssn
...
Please note: this is all based on the assumption that the SSN never changes for an individual (don't even know where to actually verify that fact from authoritative source).
请注意:这一切都基于这样的假设,即个人的 SSN 永远不会改变(甚至不知道在哪里从权威来源实际验证该事实)。
And we have Bob:
我们有鲍勃:
>>> bob = Person('bob', '1111-222-333', None)
Bob goes to see a judge to change his name:
鲍勃去找法官改名:
>>> jim = Person('jim bo', '1111-222-333', 'sf bay area')
This is what we know:
这是我们所知道的:
>>> bob == jim
True
But these are two different objects with different memory allocated, just like two different records of the same person:
但这是两个不同的对象,分配了不同的内存,就像同一个人的两个不同记录:
>>> bob is jim
False
Now comes the part where hash() is handy:
现在是 hash() 方便的部分:
>>> dmv_appointments = {}
>>> dmv_appointments[bob] = 'tomorrow'
Guess what:
你猜怎么着:
>>> dmv_appointments[jim] #?
'tomorrow'
From two different records you are able to access the same information. Now try this:
您可以从两个不同的记录访问相同的信息。现在试试这个:
>>> dmv_appointments[hash(jim)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 9, in __eq__
AttributeError: 'int' object has no attribute 'ssn'
>>> hash(jim) == hash(hash(jim))
True
What just happened? That's a collision. Because hash(jim) == hash(hash(jim))
which are both integers btw, we need to compare the input of __getitem__
with all items that collide. The builtin int
does not have an ssn
attribute so it trips.
刚刚发生了什么?那是碰撞。因为hash(jim) == hash(hash(jim))
这两个都是整数顺便说一句,我们需要将 的输入__getitem__
与所有碰撞的项目进行比较。内置int
函数没有ssn
属性,因此会跳闸。
>>> del Person.__eq__
>>> dmv_appointments[bob]
'tomorrow'
>>> dmv_appointments[jim]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: <__main__.Person object at 0x7f611bd37110>
In this last example, I show that even with a collision, the comparison is performed, the objects are no longer equal, which means it successfully raises a KeyError
.
在最后一个示例中,我表明即使发生碰撞,也会执行比较,对象不再相等,这意味着它成功引发了KeyError
.
回答by HateStackOverFlow
I also looking for it from a very long time, Now I got my answer so shearing with you all...
我也在很长一段时间内寻找它,现在我得到了我的答案,所以和你们所有人一起剪......
Please use the Dictionarydata typein python, its very very simileto the hash... Its also supported nested, simile to the to nested hash.
请在python中使用Dictionary数据类型,它与散列非常相似......它也支持嵌套,类似于嵌套散列。
Example:-
例子:-
dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}
dict['Age'] = 8; # update existing entry
dict['School'] = "DPS School" # Add new entry
print ("dict['Age']: ", dict['Age'])
print ("dict['School']: ", dict['School'])
Dictionary data type:- https://www.tutorialspoint.com/python3/python_dictionary.htm
字典数据类型:- https://www.tutorialspoint.com/python3/python_dictionary.htm
Hopes its solves the problem..
希望它能解决问题..