Python 中字符串的持久散列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2511058/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Persistent Hashing of Strings in Python
提问by Cerin
How would you convert an arbitrary string into a unique integer, which would be the same across Python sessions and platforms? For example hash('my string')
wouldn't work because a different value is returned for each Python session and platform.
您如何将任意字符串转换为唯一的整数,这在 Python 会话和平台上都是相同的?例如hash('my string')
不会工作,因为为每个 Python 会话和平台返回不同的值。
采纳答案by Jason Sundram
If a hash function really won't work for you, you can turn the string into a number.
如果散列函数真的不适合您,您可以将字符串转换为数字。
my_string = 'my string'
def string_to_int(s):
ord3 = lambda x : '%.3d' % ord(x)
return int(''.join(map(ord3, s)))
In[10]: string_to_int(my_string)
Out[11]: 109121032115116114105110103L
This is invertible, by mapping each triplet through chr
.
这是可逆的,通过将每个三元组映射到chr
。
def int_to_string(n)
s = str(n)
return ''.join([chr(int(s[i:i+3])) for i in range(0, len(s), 3)])
In[12]: int_to_string(109121032115116114105110103L)
Out[13]: 'my string'
回答by Ignacio Vazquez-Abrams
Use a hash algorithm such as MD5 or SHA1, then convert the hexdigest
via int()
:
使用哈希算法,例如 MD5 或 SHA1,然后转换hexdigest
via int()
:
>>> import hashlib
>>> int(hashlib.md5('Hello, world!').hexdigest(), 16)
144653930895353261282233826065192032313L
回答by jichi
Here are my python27 implementation for algorithms listed here: http://www.cse.yorku.ca/~oz/hash.html. No idea if they are efficient or not.
这是我的 python27 算法实现:http: //www.cse.yorku.ca/~oz/hash.html。不知道它们是否有效。
from ctypes import c_ulong
def ulong(i): return c_ulong(i).value # numpy would be better if available
def djb2(L):
"""
h = 5381
for c in L:
h = ((h << 5) + h) + ord(c) # h * 33 + c
return h
"""
return reduce(lambda h,c: ord(c) + ((h << 5) + h), L, 5381)
def djb2_l(L):
return reduce(lambda h,c: ulong(ord(c) + ((h << 5) + h)), L, 5381)
def sdbm(L):
"""
h = 0
for c in L:
h = ord(c) + (h << 6) + (h << 16) - h
return h
"""
return reduce(lambda h,c: ord(c) + (h << 6) + (h << 16) - h, L, 0)
def sdbm_l(L):
return reduce(lambda h,c: ulong(ord(c) + (h << 6) + (h << 16) - h), L, 0)
def loselose(L):
"""
h = 0
for c in L:
h += ord(c);
return h
"""
return sum(ord(c) for c in L)
def loselose_l(L):
return reduce(lambda h,c: ulong(ord(c) + h), L, 0)
回答by redtuna
First off, you probably don't reallywant the integers to be actually unique. If you do then your numbers might be unlimited in size. If that really is what you want then you could use a bignum library and interpret the bits of the string as the representation of a (potentially very large) integer. If your strings can include the \0 character then you should prepend a 1, so you can distinguish e.g. "\0\0" from "\0".
首先,你可能不真的希望整数是真正独一无二的。如果你这样做,那么你的数字可能是无限的。如果这确实是您想要的,那么您可以使用 bignum 库并将字符串的位解释为(可能非常大)整数的表示。如果您的字符串可以包含 \0 字符,那么您应该在前面加上 1,这样您就可以区分“\0\0”和“\0”。
Now, if you prefer bounded-size numbers you'll be using some form of hashing. MD5 will work but it's overkill for the stated purpose. I recommend using sdbm instead, it works very well. In C it looks like this:
现在,如果您更喜欢有界大小的数字,您将使用某种形式的散列。MD5 可以工作,但对于所述目的来说太过分了。我建议改用 sdbm,它工作得很好。在 C 中,它看起来像这样:
static unsigned long sdbm(unsigned char *str)
{
unsigned long hash = 0;
int c;
while (c = *str++)
hash = c + (hash << 6) + (hash << 16) - hash;
return hash;
}
The source, http://www.cse.yorku.ca/~oz/hash.html, also presents a few other hash functions.
来源http://www.cse.yorku.ca/~oz/hash.html还提供了一些其他哈希函数。
回答by Dan Wills
Here's another option, quite crude (probably has many collisions) and not very legible.
这是另一种选择,相当粗糙(可能有很多碰撞)并且不太清晰。
It worked for the purpose of generating an int (and later on, a random color) for different strings:
它的目的是为不同的字符串生成一个 int(以及后来的随机颜色):
aString = "don't panic"
reduce( lambda x,y:x+y, map( lambda x:ord(x[0])*x[1],zip( aString, range( 1, len( aString ) ) ) ) )