使用 Pandas 将唯一数字转换为 md5 哈希
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28674157/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert unique numbers to md5 hash using pandas
提问by Dave
Good morning, All.
大家早上好。
I want to convert my social security numbers to a md5 hash hex number. The outcome should be a unique md5 hash hex number for each social security number.
我想将我的社会保险号转换为 md5 哈希十六进制数。结果应该是每个社会保险号的唯一 md5 哈希十六进制数。
My data format is as follows:
我的数据格式如下:
ob = onboard[['regions','lname','ssno']][:10]
ob
regions lname ssno
0 Northern Region (R1) Banderas 123456789
1 Northern Region (R1) Garfield 234567891
2 Northern Region (R1) Pacino 345678912
3 Northern Region (R1) Baldwin 456789123
4 Northern Region (R1) Brody 567891234
5 Northern Region (R1) Johnson 6789123456
6 Northern Region (R1) Guinness 7890123456
7 Northern Region (R1) Hopkins 891234567
8 Northern Region (R1) Paul 891234567
9 Northern Region (R1) Arkin 987654321
I've tried the following code using hashlib:
我已经尝试使用以下代码hashlib:
import hashlib
ob['md5'] = hashlib.md5(['ssno'])
This gave me the error that it had to be a string not a list. So I tried the following:
这给了我一个错误,它必须是一个字符串而不是一个列表。所以我尝试了以下方法:
ob['md5'] = hashlib.md5('ssno').hexdigest()
regions lname ssno md5
0 Northern Region (R1) Banderas 123456789 a1b3ec3d8a026d392ad551701ad7881e
1 Northern Region (R1) Garfield 234567891 a1b3ec3d8a026d392ad551701ad7881e
2 Northern Region (R1) Pacino 345678912 a1b3ec3d8a026d392ad551701ad7881e
3 Northern Region (R1) Baldwin 456789123 a1b3ec3d8a026d392ad551701ad7881e
4 Northern Region (R1) Brody 567891234 a1b3ec3d8a026d392ad551701ad7881e
5 Northern Region (R1) Johnson 678912345 a1b3ec3d8a026d392ad551701ad7881e
6 Northern Region (R1) Johnson 789123456 a1b3ec3d8a026d392ad551701ad7881e
7 Northern Region (R1) Guiness 891234567 a1b3ec3d8a026d392ad551701ad7881e
8 Northern Region (R1) Hopkins 912345678 a1b3ec3d8a026d392ad551701ad7881e
9 Northern Region (R1) Paul 159753456 a1b3ec3d8a026d392ad551701ad7881e
This was very close to what I need but all the hex numbers came out the same regardless if the social security number was different or not. I am trying to get a hex number with unique hex numbers for each social security number.
这与我需要的非常接近,但是无论社会安全号码是否不同,所有十六进制数字都相同。我正在尝试为每个社会保险号获取一个具有唯一十六进制数的十六进制数。
Any suggestions?
有什么建议?
回答by unutbu
hashlib.md5takes a single string as input -- you can't pass it an array of values as you can with some NumPy/Pandas functions. So instead, you could use a list comprehensionto build a list of md5sums:
hashlib.md5将单个字符串作为输入——您不能像使用某些 NumPy/Pandas 函数那样将值数组传递给它。因此,您可以使用列表理解来构建 md5sum 列表:
ob['md5'] = [hashlib.md5(val).hexdigest() for val in ob['ssno']]

