Python - 为什么对唯一字符串使用 uuid4() 以外的任何东西?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2434931/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 00:38:30  来源:igfitidea点击:

Python - Why use anything other than uuid4() for unique strings?

pythonuniqueuuid

提问by orokusaki

I see quit a few implementations of unique string generation for things like uploaded image names, session IDs, et al, and many of them employ the usage of hashes like SHA1, or others.

我看到退出了一些唯一字符串生成的实现,例如上传的图像名称、会话 ID 等,其中许多都使用了 SHA1 或其他散列的用法。

I'm not questioning the legitimacy of using custom methods like this, but rather just the reason. If I want a unique string, I just say this:

我不是在质疑使用这样的自定义方法的合法性,而只是质疑原因。如果我想要一个唯一的字符串,我就这样说:

>>> import uuid
>>> uuid.uuid4()
UUID('07033084-5cfd-4812-90a4-e4d24ffb6e3d')

And I'm done with it. I wasn't very trusting before I read up on uuid, so I did this:

我已经完成了。在阅读 uuid 之前,我并不是很信任,所以我这样做了:

>>> import uuid
>>> s = set()
>>> for i in range(5000000):  # That's 5 million!
>>>     s.add(str(uuid.uuid4()))
...
...
>>> len(s)
5000000

Not one repeater (I wouldn't expect one now considering the odds are like 1.108e+50, but it's comforting to see it in action). You could even half the odds by just making your string by combining 2 uuid4()s.

没有一个中继器(考虑到赔率就像 1.108e+50,我不希望现在有一个中继器,但看到它的实际效果令人欣慰)。通过组合 2 uuid4()s 来制作字符串,您甚至可以减少一半的几率。

So, with that said, why do people spend time on random() and other stuff for unique strings, etc? Is there an important security issue or other regarding uuid?

那么,话虽如此,为什么人们要花时间在 random() 和其他东西上以获得独特的字符串等?是否存在与 uuid 相关的重要安全问题或其他问题?

采纳答案by Arion

Using a hash to uniquely identify a resource allows you to generate a 'unique' reference from the object. For instance, Git uses SHA hashing to make a unique hash that represents the exact changeset of a single a commit. Since hashing is deterministic, you'll get the same hash for the same file every time.

使用散列来唯一标识资源允许您从对象生成“唯一”引用。例如,Git 使用 SHA 散列生成一个唯一的散列,代表单个提交的确切变更集。由于散列是确定性的,您每次都会为同一个文件获得相同的散列。

Two people across the world could make the same change to the same repo independently, and Git would know they made the same change. UUID v1, v2, and v4 can't support that since they have no relation to the file or the file's contents.

世界各地的两个人可以独立地对同一个 repo 进行相同的更改,Git 会知道他们进行了相同的更改。UUID v1、v2 和 v4 不支持,因为它们与文件或文件的内容无关。

回答by Ben Voigt

Well, sometimes you want collisions. If someone uploads the same exact image twice, maybe you'd rather tell them it's a duplicate rather than just make another copy with a new name.

好吧,有时你想要碰撞。如果有人两次上传完全相同的图像,也许您宁愿告诉他们这是重复的,而不是用新名称制作另一个副本。

回答by Jason Baker

One possible reason is that you want the unique string to be human-readable. UUIDs just aren't easy to read.

一个可能的原因是您希望唯一字符串是人类可读的。UUID 只是不容易阅读。

回答by hasen

uuids are long, and meaningless (for instance, if you order by uuid, you get a meaningless result).

uuid 很长,而且毫无意义(例如,如果您按 uuid 排序,则会得到无意义的结果)。

And, because it's too long, I wouldn't want to put it in a URL or expose it to the user in any shape or form.

而且,因为它太长,我不想把它放在一个 URL 中或以任何形状或形式向用户公开它。

回答by David K. Hess

In addition to the other answers, hashes are really good for things that should be immutable. The name is unique and can be used to check the integrity of whatever it is attached to at any time.

除了其他答案之外,散列对于应该是不可变的东西真的很有用。该名称是唯一的,可用于随时检查它所连接的任何内容的完整性。

回答by jsh

Also note other kinds of UUID could even be appropriate. For example, if you want your identifier to be orderable, UUID1 is based in part on a timestamp. It's all really about your application requirements...

还要注意其他类型的 UUID 甚至可能是合适的。例如,如果您希望标识符可排序,UUID1 部分基于时间戳。这完全取决于您的应用程序要求...