Python 3.3 中的哈希函数在会话之间返回不同的结果

Question

提问by redlus

I've implemented a BloomFilter in python 3.3, and got different results every session. Drilling down this weird behavior got me to the internal hash() function - it returns different hash values for the same string every session.

我已经在 python 3.3 中实现了一个 BloomFilter，并且每个会话都得到了不同的结果。深入研究这个奇怪的行为让我找到了内部 hash() 函数——它在每个会话中为同一个字符串返回不同的哈希值。

Example:

例子：

>>> hash("235")
-310569535015251310

----- opening a new python console -----

----- 打开一个新的 python 控制台 -----

>>> hash("235")
-1900164331622581997

Why is this happening? Why is this useful?

为什么会这样？为什么这很有用？

Answer 1

采纳答案by Martijn Pieters

Python uses a random hash seed to prevent attackers from tar-pitting your application by sending you keys designed to collide. See the original vulnerability disclosure. By offsetting the hash with a random seed (set once at startup) attackers can no longer predict what keys will collide.

Python 使用随机散列种子来防止攻击者通过向您发送旨在冲突的密钥来破坏您的应用程序。请参阅原始漏洞披露。通过使用随机种子（在启动时设置一次）来抵消散列，攻击者无法再预测哪些密钥会发生冲突。

You can set a fixed seed or disable the feature by setting the PYTHONHASHSEEDenvironment variable; the default is randombut you can set it to a fixed positive integer value, with 0disabling the feature altogether.

您可以通过设置PYTHONHASHSEED环境变量来设置固定种子或禁用该功能；默认为random但您可以将其设置为固定的正整数值，并0完全禁用该功能。

Python versions 2.7 and 3.2 have the feature disabled by default (use the -Rswitch or set PYTHONHASHSEED=randomto enable it); it is enabled by default in Python 3.3 and up.

Python 2.7 和 3.2 版本默认禁用该功能（使用-R开关或设置PYTHONHASHSEED=random启用它）；它在 Python 3.3 及更高版本中默认启用。

If you were relying on the order of keys in a Python set, then don't. Python uses a hash table to implement these types and their order depends on the insertion and deletion historyas well as the random hash seed. Note that in Python 3.5 and older, this applies to dictionaries, too.

如果您依赖 Python 集中键的顺序，那就不要。Python 使用哈希表来实现这些类型，它们的顺序取决于插入和删除历史以及随机哈希种子。请注意，在 Python 3.5 及更早版本中，这也适用于字典。

Also see the object.__hash__()special method documentation:

另请参阅object.__hash__()特殊方法文档：

Note: By default, the __hash__()values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.
This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.htmlfor details.
Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).
See also PYTHONHASHSEED.

注意：默认情况下，__hash__()str、bytes 和 datetime 对象的值是用不可预测的随机值“加盐”的。尽管它们在单个 Python 进程中保持不变，但它们在 Python 的重复调用之间是不可预测的。
这旨在提供保护，防止由精心选择的输入引起的拒绝服务，这些输入利用了 dict 插入的最坏情况性能，O(n^2) 复杂度。有关详细信息，请参阅http://www.ocert.org/advisories/ocert-2011-003.html。
更改哈希值会影响字典、集合和其他映射的迭代顺序。Python 从未对此排序做出保证（并且它通常在 32 位和 64 位版本之间变化）。
另见PYTHONHASHSEED。

If you need a stable hash implementation, you probably want to look at the hashlibmodule; this implements cryptographic hash functions. The pybloom project uses this approach.

如果您需要稳定的哈希实现，您可能需要查看hashlib模块；这实现了加密哈希函数。该pybloom项目采用这种做法。

Since the offset consists of a prefix and a suffix (start value and final XORed value, respectively) you cannot just store the offset, unfortunately. On the plus side, this does mean that attackers cannot easily determine the offset with timing attacks either.

不幸的是，由于偏移量由前缀和后缀（分别为起始值和最终异或值）组成，因此您不能只存储偏移量。从好的方面来说，这确实意味着攻击者也无法通过定时攻击轻松确定偏移量。

Answer 2

回答by Peter Wood

Hash randomisation is turned on by default in Python 3. This is a security feature:

哈希随机化在 Python 3 中默认开启。这是一个安全功能：

Hash randomization is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict construction

散列随机化旨在提供保护，防止由精心选择的输入引起的拒绝服务，这些输入利用了 dict 构造的最坏情况性能

In previous versions from 2.6.8, you could switch it on at the command line with -R, or the PYTHONHASHSEEDenvironment option.

在 2.6.8 之前的版本中，您可以在命令行中使用 -R 或PYTHONHASHSEED环境选项将其打开。

You can switch it off by setting PYTHONHASHSEEDto zero.

您可以通过设置PYTHONHASHSEED为零来关闭它。

Answer 3

回答by Adam Wen

hash()is a Python built-in functionand use it to calculate a hash value for object, not for string or num.

hash()是一个 Python内置函数，用它来计算object的哈希值，而不是 string 或 num。

You can see the detail in this page: https://docs.python.org/3.3/library/functions.html#hash.

您可以在此页面中查看详细信息：https: //docs.python.org/3.3/library/functions.html#hash。

and hash() values comes from the object's __hash__ method. The doc says the followings:

和 hash() 值来自对象的 __hash__ 方法。该文档说以下内容：

By default, the hash() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

默认情况下，str、bytes 和 datetime 对象的hash() 值是用不可预测的随机值“加盐”的。尽管它们在单个 Python 进程中保持不变，但它们在 Python 的重复调用之间是不可预测的。

That's why your have diffent hash value for the same string in different console.

这就是为什么您在不同的控制台中对同一字符串具有不同的哈希值。

What you implement is not a good way.

你实施的不是一个好方法。

When you want to calculate a string hash value, just use hashlib

当你想计算一个字符串哈希值时，只需使用hashlib

hash() is aim to get a object hash value, not a stirng.

hash() 旨在获取对象哈希值，而不是搅拌。

Python 3.3 中的哈希函数在会话之间返回不同的结果

提问by redlus

采纳答案by Martijn Pieters

回答by Peter Wood

回答by Adam Wen

相关推荐

最近更新

标签

Python 3.3 中的哈希函数在会话之间返回不同的结果

提问by redlus

采纳答案by Martijn Pieters

回答by Peter Wood

回答by Adam Wen

相关推荐

Python 为 Pandas DataFrame 的图形设置 x 轴间隔（刻度）

在 Python 中接收广播数据包

Python-多处理守护进程

Python 如何快速将字典拆分为多个字典

相关推荐

最近更新

标签