C# Guid.NewGuid() VS Random.Next() 的随机字符串生成器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14983336/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Guid.NewGuid() VS a random string generator from Random.Next()
提问by George Powell
My colleague and I are debating which of these methods to use for auto generating user ID's and post ID's for identification in the database:
我和我的同事正在讨论使用这些方法中的哪些方法来自动生成用户 ID 和发布 ID 以在数据库中进行识别:
One option uses a single instance of Random, and takes some useful parameters so it can be reused for all sorts of string-gen cases (i.e. from 4 digit numeric pins to 20 digit alphanumeric ids). Here's the code:
一种选择使用 Random 的单个实例,并采用一些有用的参数,因此它可以重用于各种字符串生成情况(即从 4 位数字引脚到 20 位字母数字 ID)。这是代码:
// This is created once for the lifetime of the server instance
class RandomStringGenerator
{
public const string ALPHANUMERIC_CAPS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
public const string ALPHA_CAPS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
public const string NUMERIC = "1234567890";
Random rand = new Random();
public string GetRandomString(int length, params char[] chars)
{
string s = "";
for (int i = 0; i < length; i++)
s += chars[rand.Next() % chars.Length];
return s;
}
}
and the other option is simply to use:
另一种选择是简单地使用:
Guid.NewGuid();
We're both aware that Guid.NewGuid()
would work for our needs, but I would rather use the custom method. It does the same thing but with more control.
我们都知道这Guid.NewGuid()
可以满足我们的需求,但我宁愿使用自定义方法。它做同样的事情,但有更多的控制。
My colleague thinks that because the custom method has been cooked up ourselves, it's more likely to generate collisions. I'll admit I'm not fully aware of the implementation of Random, but I presume it is just as random as Guid.NewGuid(). A typical usage of the custom method might be:
同事觉得是因为自定义方法是我们自己熟的,比较容易产生碰撞。我承认我并不完全了解 Random 的实现,但我认为它与 Guid.NewGuid() 一样随机。自定义方法的典型用法可能是:
RandomStringGenerator stringGen = new RandomStringGenerator();
string id = stringGen.GetRandomString(20, RandomStringGenerator.ALPHANUMERIC_CAPS.ToCharArray());
Edit 1:
编辑1:
- We are using Azure Tables which doesn't have an auto increment (or similar) feature for generating keys.
- Some answers here just tell me to use NewGuid() "because that's what it's made for". I'm looking for a more in depth reason as to why the cooked up method may be more likely to generate collisions given the same degrees of freedom as a Guid.
- 我们使用的 Azure 表没有用于生成密钥的自动增量(或类似)功能。
- 这里的一些答案只是告诉我使用 NewGuid(),“因为它就是为此而生的”。我正在寻找更深入的原因,说明为什么在与 Guid 相同的自由度下,熟制方法可能更有可能产生碰撞。
Edit 2:
编辑2:
We were also using the cooked up method to generate post ID's which, unlike session tokens, need to look pretty for display in the url of our website (like http://mywebsite.com/14983336), so guids are not an option here, however collisions are still to be avoided.
我们还使用了精心制作的方法来生成帖子 ID,与会话令牌不同,它需要在我们网站的 url 中看起来很漂亮(例如http://mywebsite.com/14983336),因此这里不可以选择 guid ,但仍需避免碰撞。
采纳答案by Eric Lippert
I am looking for a more in depth reason as to why the cooked up method may be more likely to generate collisions given the same degrees of freedom as a Guid.
我正在寻找更深入的原因,为什么在与 Guid 相同的自由度下,熟化方法可能更有可能产生碰撞。
First, as others have noted, Random
is not thread-safe; using it from multiple threads can cause it to corrupt its internal data structures so that it always produces the same sequence.
首先,正如其他人所指出的,Random
它不是线程安全的;从多个线程使用它会导致它破坏其内部数据结构,因此它总是产生相同的序列。
Second, Random
is seeded based on the current time. Two instances of Random
created within the same millisecond (recall that a millisecond is several millionprocessor cycles on modern hardware) will have the same seed, and therefore will produce the same sequence.
其次,Random
根据当前时间播种。Random
在同一毫秒内创建的两个实例(回想一下,在现代硬件上,一毫秒是几百万个处理器周期)将具有相同的种子,因此将产生相同的序列。
Third, I lied. Random
is not seeded based on the current time; it is seeded based on the amount of time the machine has been active. The seed is a 32 bit number, and since the granularity is in milliseconds, that's only a few weeks until it wraps around. But that's not the problem; the problem is: the time period in which you create that instance of Random
is highly likely to be within a few minutes of the machine booting up.Every time you power-cycle a machine, or bring a new machine online in a cluster, there is a small window in which instances of Random are created, and the more that happens, the greater the odds are that you'll get a seed that you had before.
第三,我撒谎。Random
不是基于当前时间播种的;它是根据机器处于活动状态的时间来播种的。种子是一个 32 位的数字,并且由于粒度以毫秒为单位,所以只有几周的时间才能完成。但这不是问题;问题是:您创建该实例的时间段Random
很可能在机器启动后的几分钟内。每次重启机器,或在集群中使新机器联机时,都会有一个小窗口在其中创建 Random 实例,并且发生的次数越多,获得种子的几率就越大你以前有过的。
(UPDATE: Newer versions of the .NET framework have mitigated some of these problems; in those versions you no longer have every Random
created within the same millisecond have the same seed. However there are still many problems with Random
; always remember that it is only pseudo-random, not crypto-strength random. Random
is actually very predictable, so if you are relying on unpredictability, it is not suitable.)
(更新:.NET 框架的较新版本已经缓解了其中的一些问题;在这些版本中,您不再Random
在同一毫秒内创建的每个都具有相同的种子。但是,仍然存在许多问题Random
;请始终记住它只是伪-random, not crypto-strength random.Random
实际上是非常可预测的,所以如果你依赖不可预测性,它是不合适的。)
As other have said: if you want a primary key for your database then have the database generate you a primary key; let the database do its job. If you want a globally unique identifier then use a guid; that's what they're for.
正如其他人所说:如果您想要数据库的主键,那么让数据库为您生成一个主键;让数据库完成它的工作。如果您想要一个全局唯一标识符,请使用 guid;这就是他们的目的。
And finally, if you are interested in learning more about the uses and abuses of guids then you might want to read my "guid guide" series; part one is here:
最后,如果您有兴趣了解有关 guid 的使用和滥用的更多信息,那么您可能需要阅读我的“guid 指南”系列;第一部分在这里:
http://blogs.msdn.com/b/ericlippert/archive/2012/04/24/guid-guide-part-one.aspx
http://blogs.msdn.com/b/ericlippert/archive/2012/04/24/guid-guide-part-one.aspx
回答by Daniel A.A. Pelsmaeker
Use System.Guid
as it:
使用System.Guid
它:
...can be used across all computers and networks wherever a unique identifier is required.
...可以在需要唯一标识符的所有计算机和网络中使用。
Note that Random
is a pseudo-random number generator. It is not truly random, nor unique. It has only 32-bits of value to work with, compared to the 128-bit GUID.
请注意,这Random
是一个伪随机数生成器。它不是真正随机的,也不是唯一的。与 128 位 GUID 相比,它只有 32 位值可供使用。
However, even GUIDs can have collisions (although the chances are really slim), so you should use the database's own features to give you a unique identifier (e.g. the autoincrement ID column). Also, you cannot easily turn a GUID into a 4 or 20 (alpha)numeric number.
然而,即使是 GUID 也可能有冲突(尽管可能性很小),因此您应该使用数据库自己的特性来为您提供唯一标识符(例如自动增量 ID 列)。此外,您无法轻松地将 GUID 转换为 4 或 20(字母)数字。
回答by Jordan Parmer
"Auto generating user ids and post ids for identification in the database"...why not use a database sequence or identity to generate keys?
“自动生成用户 ID 和发布 ID 以便在数据库中进行识别”...为什么不使用数据库序列或身份来生成密钥?
To me your question is really, "What is the best way to generate a primary key in my database?" If that is the case, you should use the conventional tool of the database which will either be a sequence or identity. These have benefits over generated strings.
对我来说,您的问题实际上是“在我的数据库中生成主键的最佳方法是什么?” 如果是这种情况,您应该使用数据库的常规工具,它可以是序列或身份。这些比生成的字符串有好处。
- Sequences/identity index better. There are numerous articles and blog posts that explain why GUIDs and so forth make poor indexes.
- They are guaranteed to be unique within the table
- They can be safely generated by concurrent inserts without collision
- They are simple to implement
- 序列/身份索引更好。有许多文章和博客文章解释了为什么 GUID 等会导致索引不佳。
- 它们保证在表中是唯一的
- 它们可以通过并发插入安全地生成而不会发生冲突
- 它们易于实施
I guess my next question is, what reasons are you considering GUID's or generated strings? Will you be integrating across distributed databases? If not, you should ask yourself if you are solving a problem that doesn't exist.
我想我的下一个问题是,您考虑 GUID 或生成的字符串的原因是什么?您会跨分布式数据库进行集成吗?如果没有,你应该问问自己你是否正在解决一个不存在的问题。
回答by erikkallen
Contrary to what some people have said in the comment, a GUID generated by Guid.NewGuid() is NOT dependent on any machine-specific identifier (only type 1 GUIDs are, Guid.NewGuid() returns a type 4 GUID, which is mostly random).
与某些人在评论中所说的相反,由 Guid.NewGuid() 生成的 GUID 不依赖于任何特定于机器的标识符(只有类型 1 GUID,Guid.NewGuid() 返回类型 4 GUID,这主要是随机的)。
As long as you don't need cryptographic security, the Random
class should be good enough, but if you want to be extra safe, use System.Security.Cryptography.RandomNumberGenerator
. For the Guid approach, note that not all digits in a GUID are random. Quote from wikipedia:
只要您不需要加密安全性,Random
该类应该足够好,但是如果您想更加安全,请使用System.Security.Cryptography.RandomNumberGenerator
. 对于 Guid 方法,请注意并非 GUID 中的所有数字都是随机的。引自维基百科:
In the canonical representation,
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
, the most significant bits of N indicates the variant (depending on the variant; one, two or three bits are used). The variant covered by the UUID specification is indicated by the two most significant bits of N being 1 0 (i.e. the hexadecimal N will always be 8, 9, A, or B). In the variant covered by the UUID specification, there are five versions. For this variant, the four bits of M indicates the UUID version (i.e. the hexadecimal M will either be 1, 2, 3, 4, or 5).
在规范表示中,
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
N 的最高有效位表示变体(取决于变体;使用 1、2 或 3 位)。UUID 规范所涵盖的变体由 N 的两个最高有效位表示为 1 0(即十六进制 N 将始终为 8、9、A 或 B)。在 UUID 规范涵盖的变体中,有五个版本。对于此变体,M 的四位表示 UUID 版本(即十六进制 M 将是 1、2、3、4 或 5)。
回答by CodesInChaos
Your custom method has two problems:
您的自定义方法有两个问题:
- It uses a global instance of
Random
, but doesn't use locking. => Multi threaded access can corrupt its state. After which the output will suck even more than it already does. - It uses a predictable 31 bit seed. This has two consequences:
- You can't use it for anything security related where unguessability is important
- The small seed (31 bits) can reduce the quality of your numbers. For example if you create multiple instances of
Random
at the same time(since system startup) they'll probably create the same sequence of random numbers.
- 它使用 的全局实例
Random
,但不使用锁定。=> 多线程访问可能会破坏其状态。之后输出会比现在更糟糕。 - 它使用可预测的 31 位种子。这有两个后果:
- 你不能将它用于任何与安全相关的不可猜测性很重要的事情
- 小种子(31 位)会降低数字的质量。例如,如果您同时创建多个实例
Random
(自系统启动以来),它们可能会创建相同的随机数序列。
This means you cannot rely on the output of Random
being unique, no matter how long it is.
这意味着您不能依赖Random
独特的输出,无论它有多长。
I recommend using a CSPRNG (RNGCryptoServiceProvider
) even if you don't need security. Its performance is still acceptable for most uses, and I'd trust the quality of its random numbers over Random
. If you you want uniqueness, I recommend getting numbers with around 128 bits.
RNGCryptoServiceProvider
即使您不需要安全性,我也建议使用 CSPRNG ( )。它的性能对于大多数用途来说仍然是可以接受的,我相信它的随机数的质量超过Random
. 如果您想要唯一性,我建议您获取大约 128 位的数字。
To generate random strings using RNGCryptoServiceProvider
you can take a look at my answer to How can I generate random 8 character, alphanumeric strings in C#?.
要使用生成随机字符串,RNGCryptoServiceProvider
您可以查看我对如何在 C# 中生成随机 8 个字符的字母数字字符串的回答?.
Nowadays GUIDs returned by Guid.NewGuid()
are version 4 GUIDs. They are generated from a PRNG, so they have pretty similar properties to generating a random 122 bit number (the remaining 6 bits are fixed). Its entropy source has much higher quality than what Random
uses, but it's not guaranteed to be cryptographically secure.
现在返回的 GUIDGuid.NewGuid()
是版本 4 GUID。它们是从 PRNG 生成的,因此它们具有与生成随机 122 位数字非常相似的属性(其余 6 位是固定的)。它的熵源比Random
使用的质量高得多,但不能保证加密安全。
But the generation algorithm can change at any time, so you can't rely on that. For example in the past the Windows GUID generation algorithm changed from v1 (based on MAC + timestamp) to v4 (random).
但是生成算法可以随时更改,因此您不能依赖它。例如,过去 Windows GUID 生成算法从 v1(基于 MAC + 时间戳)更改为 v4(随机)。
回答by GalacticCowboy
Regarding your edit, here is one reason to prefer a GUID over a generated string:
关于您的编辑,这是首选 GUID 而非生成的字符串的原因之一:
The native storage for a GUID (uniqueidentifier) in SQL Server is 16 bytes. To store a equivalent-length varchar (string), where each "digit" in the id is stored as a character, would require somewhere between 32 and 38 bytes, depending on formatting.
SQL Server 中 GUID(唯一标识符)的本机存储为 16 字节。要存储等效长度的 varchar(字符串),其中 id 中的每个“数字”都存储为一个字符,需要 32 到 38 个字节之间的某个位置,具体取决于格式。
Because of its storage, SQL Server is also able to index a uniqueidentifier column more efficiently than a varchar column as well.
由于其存储,SQL Server 还能够比 varchar 列更有效地为 uniqueidentifier 列编制索引。
回答by George Powell
As written in other answers, my implementation had a few severe problems:
正如其他答案中所写,我的实现有一些严重的问题:
- Thread safety:Random is not thread safe.
- Predictability:the method couldn't be used for security critical identifiers like session tokens due to the nature of the Random class.
- Collisions:Even though the method created 20 'random' numbers, the probability of a collision is not
(number of possible chars)^20
due to the seed value only being 31 bits, and coming from a bad source. Given the same seed, anylength of sequence will be the same.
- 线程安全:Random 不是线程安全的。
- 可预测性:由于 Random 类的性质,该方法不能用于安全关键标识符,如会话令牌。
- 冲突:尽管该方法创建了 20 个“随机”数,但冲突的概率并不是
(number of possible chars)^20
因为种子值只有 31 位,并且来自错误的来源。给定相同的种子,任何长度的序列都是相同的。
Guid.NewGuid()
would be fine, except we don't want to use ugly GUIDs in urls and .NETs NewGuid() algorithm is not known to be cryptographically secure for use in session tokens - it might give predictable results if a little information is known.
Guid.NewGuid()
会很好,除非我们不想在 url 中使用丑陋的 GUID,并且 .NETs NewGuid() 算法在会话令牌中使用时不知道是加密安全的 - 如果知道一点信息,它可能会给出可预测的结果。
Here is the code we're using now, it is secure, flexible and as far as I know it's very unlikely to create collisions if given enough length and character choice:
这是我们现在使用的代码,它是安全、灵活的,据我所知,如果给定足够的长度和字符选择,它不太可能产生冲突:
class RandomStringGenerator
{
RNGCryptoServiceProvider rand = new RNGCryptoServiceProvider();
public string GetRandomString(int length, params char[] chars)
{
string s = "";
for (int i = 0; i < length; i++)
{
byte[] intBytes = new byte[4];
rand.GetBytes(intBytes);
uint randomInt = BitConverter.ToUInt32(intBytes, 0);
s += chars[randomInt % chars.Length];
}
return s;
}
}