为什么 Python 的无穷大哈希有 π 的数字?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/56227419/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does Python's hash of infinity have the digits of π?
提问by wim
采纳答案by Patrick Haugh
_PyHASH_INF
is defined as a constantequal to 314159
.
I can't find any discussion about this, or comments giving a reason. I think it was chosen more or less arbitrarily. I imagine that as long as they don't use the same meaningful value for other hashes, it shouldn't matter.
我找不到任何关于此的讨论,或给出理由的评论。我认为它或多或少是任意选择的。我想只要他们不对其他散列使用相同的有意义的值,就没有关系。
回答by ShreevatsaR
Summary: It's not a coincidence; _PyHASH_INF
is hardcoded as 314159in the default CPython implementation of Python, and was picked as an arbitrary value (obviously from the digits of π) by Tim Peters in 2000.
总结:这不是巧合;在 Python 的默认 CPython 实现中_PyHASH_INF
被硬编码为 314159,并且在 2000 年被 Tim Peters选择为任意值(显然来自 π 的数字)。
The value of hash(float('inf'))
is one of the system-dependent parameters of the built-in hash function for numeric types, and is also availableas sys.hash_info.inf
in Python 3:
的值hash(float('inf'))
是数值类型内置散列函数的系统相关的参数中的一个,并且也可以作为sys.hash_info.inf
在Python 3:
>>> import sys
>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>> sys.hash_info.inf
314159
(Same results with PyPytoo.)
(与 PyPy 的结果相同。)
In terms of code, hash
is a built-in function. Calling it on a Python float object invokes the function whose pointer is given by the tp_hash
attributeof the built-in float type (PyTypeObject PyFloat_Type
), which isthe float_hash
function, definedas return _Py_HashDouble(v->ob_fval)
, which in turn has
就代码而言,hash
是一个内置函数。它调用一个Python浮动物体上就会调用其指针由给定的功能tp_hash
属性内置浮子式(的PyTypeObject PyFloat_Type
),它是所述float_hash
的功能,定义为return _Py_HashDouble(v->ob_fval)
,其又具有
if (Py_IS_INFINITY(v))
return v > 0 ? _PyHASH_INF : -_PyHASH_INF;
where _PyHASH_INF
is defined as314159:
其中_PyHASH_INF
被定义为314159:
#define _PyHASH_INF 314159
In terms of history, the first mention of 314159
in this context in the Python code (you can find this with git bisect
or git log -S 314159 -p
) was added by Tim Petersin August 2000, in what is now commit 39dce293in the cpython
git repository.
就历史而言,314159
在 Python 代码(您可以使用git bisect
或找到它git log -S 314159 -p
)中第一次提到在此上下文中是由Tim Peters在 2000 年 8 月添加的,现在在git 存储库中提交39dce293cpython
。
The commit message says:
提交消息说:
Fix for http://sourceforge.net/bugs/?func=detailbug&bug_id=111866&group_id=5470. This was a misleading bug -- the true "bug" was that
hash(x)
gave an error return whenx
is an infinity. Fixed that. Added newPy_IS_INFINITY
macro topyport.h
. Rearranged code to reduce growing duplication in hashing of float and complex numbers, pushing Trent's earlier stab at that to a logical conclusion. Fixed exceedingly rare bug where hashing of floats could return -1 even if there wasn't an error (didn't waste time trying to construct a test case, it was simply obvious from the code that it couldhappen). Improved complex hash so thathash(complex(x, y))
doesn't systematically equalhash(complex(y, x))
anymore.
修复http://sourceforge.net/bugs/?func=detailbug&bug_id=111866&group_id=5470。这是一个误导性的错误——真正的“错误”是在无穷大
hash(x)
时给出错误返回x
。修正了那个。添加了新的Py_IS_INFINITY
宏到pyport.h
. 重新排列代码以减少浮点数和复数散列中不断增加的重复,将 Trent 早先的观点推向一个合乎逻辑的结论。修复了极其罕见的错误,即使没有错误,浮点数的散列也可能返回 -1(没有浪费时间尝试构建测试用例,从代码中很明显它可能发生)。改进了复杂的哈希,使其hash(complex(x, y))
不再系统地相等hash(complex(y, x))
。
In particular, in this commit he ripped out the code of static long float_hash(PyFloatObject *v)
in Objects/floatobject.c
and made it just return _Py_HashDouble(v->ob_fval);
, and in the definition of long _Py_HashDouble(double v)
in Objects/object.c
he added the lines:
特别是,在这次提交中,他撕掉了static long float_hash(PyFloatObject *v)
in的代码Objects/floatobject.c
并将其设为 just return _Py_HashDouble(v->ob_fval);
,并在 in 的定义long _Py_HashDouble(double v)
中Objects/object.c
添加了以下几行:
if (Py_IS_INFINITY(intpart))
/* can't convert to long int -- arbitrary */
v = v < 0 ? -271828.0 : 314159.0;
So as mentioned, it was an arbitrary choice. Note that 271828 is formed from the first few decimal digits of e.
如前所述,这是一个任意选择。请注意, 271828 由e的前几个十进制数字组成。
Related later commits:
相关的后续提交:
By Mark Dickinson in Apr 2010(also), making the
Decimal
type behave similarlyBy Mark Dickinson in Apr 2010(also), moving this check to the top and adding test cases
By Mark Dickinson in May 2010as issue 8188, completely rewriting the hash function to its current implementation, but retaining this special case, giving the constant a name
_PyHASH_INF
(also removing the 271828 which is why in Python 3hash(float('-inf'))
returns-314159
rather than-271828
as it does in Python 2)By Raymond Hettinger in Jan 2011, adding an explicit example in the "What's new" for Python 3.2 of
sys.hash_info
showing the above value. (See here.)By Stefan Krah in Mar 2012modifying the Decimal module but keeping this hash.
By Christian Heimes in Nov 2013, moved the definition of
_PyHASH_INF
fromInclude/pyport.h
toInclude/pyhash.h
where it now lives.
作者:Mark Dickinson 在 2010 年 4 月(也),使
Decimal
类型的行为类似作者:Mark Dickinson 于 2010 年 4 月(也),将此检查移至顶部并添加测试用例
Mark Dickinson 于 2010 年 5 月作为issue 8188将哈希函数完全重写为其当前实现,但保留了这个特殊情况,给常量一个名称
_PyHASH_INF
(也删除了 271828 这就是为什么在 Python 3 中hash(float('-inf'))
返回-314159
而不是-271828
像在 Python 中那样返回)2)作者:Raymond Hettinger 于 2011 年 1 月,在 Python 3.2 的“新增功能”中添加了一个
sys.hash_info
显示上述值的显式示例。(见这里。)作者:Stefan Krah 在 2012 年 3 月修改了 Decimal 模块,但保留了这个哈希值。
由基督教海梅斯在2013年11月,移动的定义
_PyHASH_INF
来自Include/pyport.h
于Include/pyhash.h
它现在的生活。
回答by Alec Alameddine
Indeed,
的确,
sys.hash_info.inf
returns 314159
. The value is not generated, it's built into the source code.
In fact,
返回314159
。该值不是生成的,它内置在源代码中。实际上,
hash(float('-inf'))
returns -271828
, or approximately -e, in python 2 (it's -314159 now).
-271828
在python 2中返回,或大约-e(现在是-314159)。
The fact that the two most famous irrational numbers of all time are used as the hash values makes it very unlikely to be a coincidence.
有史以来最著名的两个无理数被用作哈希值这一事实使得这不太可能是巧合。