为什么 Python 哈希函数在 Android 实现上运行时不会给出相同的值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17192418/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why doesn't Python hash function give the same values when run on Android implementation?
提问by bkmagnetron
I believed that hash()function works the same in all python interpreters. But it differs when I run it on my mobile using python for android. I get same hash value for hashing strings and numbers but when I hash built-in data types the hash value differs.
我相信这个hash()函数在所有 python 解释器中的工作原理都是一样的。但是当我使用python for android在我的手机上运行它时它会有所不同。对于散列字符串和数字,我得到相同的散列值,但是当我散列内置数据类型时,散列值不同。
PC Python Interpreter (Python 2.7.3)
PC Python 解释器 (Python 2.7.3)
>>> hash(int)
31585118
>>> hash("hello sl4a")
1532079858
>>> hash(101)
101
Mobile Python Interpreter (Python 2.6.2)
移动 Python 解释器 (Python 2.6.2)
>>> hash(int)
-2146549248
>>> hash("hello sl4a")
1532079858
>>> hash(101)
101
Can any one tell me is it a bug or I misunderstood something.
谁能告诉我这是一个错误还是我误解了一些东西。
采纳答案by andrew cooke
for old python (at least, my Python 2.7), it seems that
对于旧的python(至少,我的Python 2.7),似乎
hash(<some type>) = id(<type>) / 16
and for CPython id()is the address in memory - http://docs.python.org/2/library/functions.html#id
对于 CPythonid()是内存中的地址 - http://docs.python.org/2/library/functions.html#id
>>> id(int) / hash(int)
16
>>> id(int) % hash(int)
0
so my guess is that the Android port has some strange convention for memory addresses?
所以我的猜测是 Android 端口对内存地址有一些奇怪的约定?
anyway, given the above, hashes for types (and other built-ins i guess) will differ across installs because functions are at different addresses.
无论如何,鉴于上述情况,类型(和我猜的其他内置函数)的哈希值会因安装而异,因为函数位于不同的地址。
in contrast, hashes for values (what i think you mean by "non-internal objects") (before the random stuff was added) are calculated from their values and so likely repeatable.
相比之下,值的散列(我认为您所说的“非内部对象”是指“非内部对象”)(在添加随机内容之前)是根据它们的值计算的,因此很可能是可重复的。
PS but there's at least one more CPython wrinkle:
PS 但至少还有一个 CPython 皱纹:
>>> for i in range(-1000,1000):
... if hash(i) != i: print(i)
...
-1
there's an answer here somewhere explaining that one...
这里有一个答案可以解释那个……
回答by Sneftel
Hashing of things like int relies on id(), which is not guaranteed constant between runs or between interpreters. That is, hash(int) will always produce the same result during a program's run, but might not compare equal between runs, either on the same platform or on different platforms.
像 int 这样的散列依赖于 id(),它不能保证在运行之间或解释器之间保持不变。也就是说, hash(int) 在程序运行期间将始终产生相同的结果,但在同一平台或不同平台上的运行之间可能不会比较相等。
BTW, while hash randomization is available in Python, it's disabled by default. Since your strings and numbers are hashing equally, clearly it's not the issue here.
顺便说一句,虽然在 Python 中可以使用哈希随机化,但默认情况下它是禁用的。由于您的字符串和数字的散列相等,显然这不是这里的问题。
回答by John La Rooy
hash()is randomised by default each time you start a new instance of recent versions (Python3.3+) to prevent dictionary insertion DOS attacks
hash()每次启动最新版本(Python3.3+)的新实例时默认随机化,以防止字典插入 DOS 攻击
Prior to that, hash()was different for 32bit and 64bit builds anyway.
在此之前,hash()无论如何,32 位和 64 位版本是不同的。
If you want something that doeshash to the same thing every time, use one of the hashes in hashlib
如果你想要的东西,确实哈希同样的事情每次都使用hashlib散列之一
>>> import hashlib
>>> hashlib.algorithms
('md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512')
回答by Sylvain Leroux
With CPython, for efficiency reason hash()on internal objects returns the same value as id()which in its turn return the memory location("address") of the object.
使用 CPython,出于效率原因hash(),内部对象返回与返回对象id()的内存位置(“地址”)相同的值。
From one CPython-based interpreter to an other memory location of such object is subject to change. Depending on your OS, this could change from one run to an other.
从一个基于 CPython 的解释器到此类对象的其他内存位置可能会发生变化。根据您的操作系统,这可能会从一次运行更改为另一次运行。
回答by nabiltos
From Python 3.3 the default hash algorithm has created hash values which are salted with a random value which is different even between different python processes on the same machine.
从 Python 3.3 开始,默认散列算法创建了散列值,这些值用随机值加盐,即使在同一台机器上的不同 python 进程之间也是不同的。
Hash randomization only is implemented currently for strings - since it was considered to be the most likely data type captured from outside that could be attacked.
目前仅对字符串实施哈希随机化 - 因为它被认为是从外部捕获的最有可能被攻击的数据类型。
The same frozenset consistently produces the same hash value across different machines or even different processes
相同的frozenset在不同的机器甚至不同的进程中始终产生相同的哈希值
Source: https://www.quora.com/Do-two-computers-produce-the-same-hash-for-identical-objects-in-Python
来源:https: //www.quora.com/Do-two-computers-produce-the-same-hash-for-identical-objects-in-Python

