database Sequential Guid 与标准 Guid 相比有哪些性能改进?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/170346/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 06:59:36  来源:igfitidea点击:

What are the performance improvement of Sequential Guid over standard Guid?

databaseprimary-keyguid

提问by massimogentilini

Has someone ever measured performance of Sequential Guid vs. Standard Guid when used as Primary Keys inside a database?

有没有人测量过 Sequential Guid 与标准 Guid 在用作数据库中的主键时的性能?



I do not see the need for unique keys to be guessable or not, passing them from a web UI or in some other part seems a bad practice by itself and I do not see, if you have security concerns, how using a guid can improve things (if this is the matter use a real random number generator using the proper crypto functions of the framework).
The other items are covered by my approach, a sequential guid can be generated from code without need for DB access (also if only for Windows) and it's unique in time and space.
And yes, question was posed with the intent of answering it, to give people that have choosen Guids for their PK a way to improve database usage (in my case has allowed the customers to sustain a much higher workload without having to change servers).

It seems that security concerns are a lot, in this case do not use Sequential Guid or, better still, use standard Guid for PK that are passed back and forward from your UI and sequential guid for everything else. As always there is no absolute truth, I've edited also main answer to reflect this.

我不认为是否需要唯一密钥是可猜测的,从 Web UI 或其他部分传递它们本身似乎是一种不好的做法,而且我不认为,如果您有安全问题,如何使用 guid 进行改进事情(如果这是问题,请使用使用框架的适当加密功能的真实随机数生成器)。
我的方法涵盖了其他项目,可以从代码生成顺序 guid,而无需访问数据库(如果仅适用于 Windows),并且它在时间和空间上都是独一无二的。
是的,提出这个问题的目的是为了回答它,让那些为他们的 PK 选择 Guid 的人提供一种提高数据库使用率的方法(在我的情况下,它允许客户维持更高的工作量而无需更改服务器)。

似乎安全问题很多,在这种情况下,不要使用 Sequential Guid,或者更好的是,对从 UI 来回传递的 PK 使用标准 Guid,对其他所有内容使用顺序 Guid。一如既往,没有绝对的真理,我也编辑了主要答案以反映这一点。

回答by massimogentilini

GUID vs.Sequential GUID

GUID 与顺序 GUID



A typical pattern it's to use Guid as PK for tables, but, as referred in other discussions (see Advantages and disadvantages of GUID / UUID database keys) there are some performance issues.

This is a typical Guid sequence



一个典型的模式是使用 Guid 作为表的 PK,但是,正如其他讨论中所提到的(参见GUID / UUID 数据库键的优缺点),存在一些性能问题。

这是一个典型的 Guid 序列

f3818d69-2552-40b7-a403-01a6db4552f7
7ce31615-fafb-42c4-b317-40d21a6a3c60
94732fc7-768e-4cf2-9107-f0953f6795a5


Problems of this kind of data are:<
-

f3818d69-2552-40b7-a403-01a6db4552f7
7ce31615-fafb-42c4-b317-40d21a6a3c60
94732fc7-768e-4cf2-9107-f0953f6795a5


这种数据的问题是:<
-

  • Wide distributions of values
  • Almost randomically ones
  • Index usage is very, very, very bad
  • A lot of leaf moving
  • Almost every PK need to be at least on a non clustered index
  • Problem happens both on Oracle and SQL Server
  • 广泛的价值分布
  • 几乎是随机的
  • 索引使用非常非常非常糟糕
  • 很多叶子在动
  • 几乎每个 PK 都需要至少在一个非聚集索引上
  • Oracle 和 SQL Server 均出现问题



A possible solution is using Sequential Guid, that are generated as follows:

cc6466f7-1066-11dd-acb6-005056c00008
cc6466f8-1066-11dd-acb6-005056c00008
cc6466f9-1066-11dd-acb6-005056c00008




一个可能的解决方案是使用 Sequential Guid,其生成方式如下:

cc6466f7-1066-11dd-acb6-005056c00008
cc6466f8-1066-11dd-acb6-005056c00008
cc6466f9-10005ac


How to generate them From C# code:

如何从 C# 代码生成它们:

[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);

public static Guid SequentialGuid()
{
    const int RPC_S_OK = 0;
    Guid g;
    if (UuidCreateSequential(out g) != RPC_S_OK)
        return Guid.NewGuid();
    else
        return g;
}


Benefits


好处

  • Better usage of index
  • Allow usage of clustered keys (to be verified in NLB scenarios)
  • Less disk usage
  • 20-25% of performance increase at a minimum cost
  • 更好地使用索引
  • 允许使用集群密钥(在 NLB 场景中验证)
  • 更少的磁盘使用
  • 以最低成本提高 20-25% 的性能



Real life measurement:Scenario:



现实生活测量:场景:

  • Guid stored as UniqueIdentifier types on SQL Server
  • Guid stored as CHAR(36) on Oracle
  • Lot of insert operations, batched together in a single transaction
  • From 1 to 100s of inserts depending on table
  • Some tables > 10 millions rows
  • 在 SQL Server 上存储为 UniqueIdentifier 类型的 Guid
  • 在 Oracle 上存储为 CHAR(36) 的 Guid
  • 许多插入操作,在单个事务中批量处理
  • 从 1 到 100 次插入,具体取决于表
  • 一些表 > 1000 万行



Laboratory Test – SQL Server

VS2008 test, 10 concurrent users, no think time, benchmark process with 600 inserts in batch for leaf table
Standard Guid
Avg. Process duration: 10.5sec
Avg. Request for second: 54.6
Avg. Resp. Time: 0.26

Sequential Guid
Avg. Process duration: 4.6sec
Avg. Request for second: 87.1
Avg. Resp. Time: 0.12

Results on Oracle(sorry, different tool used for test) 1.327.613 insert on a table with a Guid PK

Standard Guid, 0.02sec. elapsed time for each insert, 2.861sec. of CPU time, total of 31.049sec. elapsed

Sequential Guid, 0.00sec. elapsed time for each insert, 1.142sec. of CPU time, total of 3.667sec. elapsed

The DB file sequential read wait time passed from 6.4millions wait events for 62.415seconds to 1.2million wait events for 11.063seconds.

It's important to see that all the sequential guid can be guessed, so it's not a good idea to use them if security is a concern, still using standard guid.
To make it short... if you use Guid as PK use sequential guid every time they are not passed back and forward from a UI, they will speed up operation and do not cost anything to implement.



实验室测试 – SQL Server

VS2008 测试,10 个并发用户,没有思考时间,叶表
Standard Guid
Avg. 600 次批量插入的基准过程 。过程持续时间:平均10.5
请求第二:54.6
平均。分别 时间:0.26

Sequential Guid
Avg。过程持续时间:平均4.6
请求第二:87.1
平均。分别 时间:Oracle 0.12

结果(抱歉,用于测试的工具不同) 1.327.613 使用 Guid PK

Standard Guid插入表,0.02秒。每个插入的经过时间,2.861秒。CPU 时间,总计31.049秒 elapsed

Sequential Guid, 0.00sec. 每个插入的经过时间,1.142秒。CPU 时间,总共3.667秒。elapsed

DB 文件顺序读取等待时间从640万个等待事件62.415秒变为120万个等待事件11.063秒。

重要的是要看到所有的顺序 guid 都可以猜到,所以如果安全是一个问题,使用它们不是一个好主意,仍然使用标准 guid。
简而言之……如果您使用 Guid 作为 PK,则每次它们不从 UI 来回传递时都使用顺序 guid,它们将加快操作速度并且不需要任何实施成本。

回答by Dan

I may be missing something here (feel free to correct me if I am), but I can see very little benefit in using sequential GUID/UUIDs for primary keys.

我可能在这里遗漏了一些东西(如果我是,请随时纠正我),但我认为使用顺序 GUID/UUID 作为主键的好处很少。

The pointof using GUIDs or UUIDs over autoincrementing integers is:

在自动递增整数上使用 GUID 或 UUID的要点是:

  • They can be created anywhere withoutcontacting the database
  • They are identifiers that are entirely unique within your application (and in the case of UUIDs, universally unique)
  • Given one identifier, there is no way to guess the next or previous (or even anyother valid identifiers) outside of brute-forcing a hugekeyspace.
  • 它们可以在任何地方创建而无需联系数据库
  • 它们是在您的应用程序中完全唯一的标识符(在 UUID 的情况下,是普遍唯一的)
  • 给定一个标识符,除了暴力破解巨大的密钥空间之外,无法猜测下一个或前一个(甚至任何其他有效标识符)。

Unfortunately, using your suggestion, you lose allthose things.

不幸的是,使用您的建议,您将失去所有这些东西。

So, yes. You've made GUIDs better. But in the process, you've thrown away almost all of the reasons to use them in the first place.

所以,是的。您已经使 GUID 变得更好。但在这个过程中,你一开始就抛弃了几乎所有使用它们的理由。

If you reallywant to improve performance, use a standard autoincrementing integer primary key. That provides all the benefits you described (and more) while being better than a 'sequential guid' in almost every way.

如果您真的想提高性能,请使用标准的自动递增整数主键。这提供了您描述的所有好处(以及更多),同时几乎在所有方面都比“顺序指南”更好。

This will most likely get downmodded into oblivion as it doesn't specifically answer your question (which is apparently carefully-crafted so you could answer it yourself immediately), but I feel it's a far more important point to raise.

这很可能会被遗忘,因为它没有专门回答您的问题(这显然是精心设计的,因此您可以立即自己回答),但我觉得这是一个更重要的问题。

回答by Bernhard Kircher

As massimogentilini already said, Performance can be improved when using UuidCreateSequential (when generating the guids in code). But a fact seems to be missing: The SQL Server (at least Microsoft SQL 2005 / 2008) uses the same functionality, BUT: the comparison/ordering of Guids differ in .NET and on the SQL Server, which would still cause more IO, because the guids will not be ordered correctly. In order to generate the guids ordered correctly for sql server (ordering), you have to do the following (see comparisondetails):

正如 massimogentilini 已经说过的那样,使用 UuidCreateSequential(在代码中生成 guid 时)可以提高性能。但似乎缺少一个事实:SQL Server(至少 Microsoft SQL 2005 / 2008)使用相同的功能,但是:Guids 的比较/排序在 .NET 和 SQL Server 上有所不同,这仍然会导致更多的 IO,因为 guid 将不会正确排序。为了生成 sql server 正确排序的 guid(排序),您必须执行以下操作(请参阅比较详细信息):

[System.Runtime.InteropServices.DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(byte[] buffer);

static Guid NewSequentialGuid() {

    byte[] raw = new byte[16];
    if (UuidCreateSequential(raw) != 0)
        throw new System.ComponentModel.Win32Exception(System.Runtime.InteropServices.Marshal.GetLastWin32Error());

    byte[] fix = new byte[16];

    // reverse 0..3
    fix[0x0] = raw[0x3];
    fix[0x1] = raw[0x2];
    fix[0x2] = raw[0x1];
    fix[0x3] = raw[0x0];

    // reverse 4 & 5
    fix[0x4] = raw[0x5];
    fix[0x5] = raw[0x4];

    // reverse 6 & 7
    fix[0x6] = raw[0x7];
    fix[0x7] = raw[0x6];

    // all other are unchanged
    fix[0x8] = raw[0x8];
    fix[0x9] = raw[0x9];
    fix[0xA] = raw[0xA];
    fix[0xB] = raw[0xB];
    fix[0xC] = raw[0xC];
    fix[0xD] = raw[0xD];
    fix[0xE] = raw[0xE];
    fix[0xF] = raw[0xF];

    return new Guid(fix);
}

or this linkor this link.

此链接此链接

回答by Bryon

See This article: (http://www.shirmanov.com/2010/05/generating-newsequentialid-compatible.html)

见这篇文章:(http://www.shirmanov.com/2010/05/generating-newsequentialid-compatible.html

Even though MSSql uses this same function to generate NewSequencialIds ( UuidCreateSequential(out Guid guid) ), MSSQL reverses the 3rd and 4th byte patterns which does not give you the same result that you would get when using this function in your code. Shirmanov shows how to get the exact same results that MSSQL would create.

即使 MSSql 使用相同的函数来生成 NewSequencialIds ( UuidCreateSequential(out Guid guid) ),MSSQL 反转了第 3 和第 4 字节模式,这不会给您在代码中使用此函数时获得的相同结果。Shirmanov 展示了如何获得与 MSSQL 完全相同的结果。

回答by Sklivvz

If you needto use sequential GUIds, SQL Server 2005 can generate them for you with the NEWSEQUENTIALID()function.

如果您需要使用顺序 GUId,SQL Server 2005 可以使用该NEWSEQUENTIALID()功能为您生成它们。

Howeversince the basic usage of GUIds is to generate keys (or alternate keys) that cannot be guessed (for example to avoid people passing guessed keys on GETs), I don't see how applicable they are because they are so easily guessed.

但是,由于 GUId 的基本用法是生成无法猜测的键(或备用键)(例如,为了避免人们在 GET 上传递猜测的键),我看不出它们有多适用,因为它们很容易被猜到。

From MSDN:

MSDN

Important:
If privacy is a concern, do not use this function. It is possible to guess the value of the next generated GUID and, therefore, access data associated with that GUID.

重要:
如果隐私是一个问题,请不要使用此功能。可以猜测下一个生成的 GUID 的值,从而访问与该 GUID 关联的数据。

回答by Mitch Wheat

Check out COMBsby Jimmy Nilsson: a type of GUID where a number of bits have been replaced with a timestamp-like value. This means that the COMBs can be ordered, and when used as a primary key result in less index page splits when inserting new values.

查看Jimmy Nilsson 的COMB:一种 GUID,其中许多位已被替换为类似时间戳的值。这意味着可以对 COMB 进行排序,并且当用作主键时,插入新值时会导致更少的索引页拆分。

Is it OK to use a uniqueidentifier (GUID) as a Primary Key?

可以使用唯一标识符 (GUID) 作为主键吗?

回答by Alex Siepman

I messured difference between Guid (clustered and non clustered), Sequential Guid and int (Identity/autoincrement) using Entity Framework. The Sequential Guid was surprisingly fast compared to the int with identity. Results and code of the Sequential Guid here.

我使用实体框架混淆了 Guid(集群和非集群)、顺序 Guid 和 int(身份/自动增量)之间的区别。与具有标识的 int 相比,Sequential Guid 的速度惊人地快。Sequential Guid 的结果和代码在这里

回答by Dennis

OK, I finally got to this point in design and production myself.

好的,我终于在自己的设计和生产中达到了这一点。

I generate a COMB_GUID where the upper 32 bits are based on the bits 33 through 1 of Unix time in milliseconds. So, there are 93 bits of randomness every 2 milliseconds and the rollover on the upper bits happens every 106 years. The actual physical representation of the COMB_GUID (or type 4 UUID) is a base64 encoded version of the 128 bits, which is a 22 char string.

我生成了一个 COMB_GUID,其中高 32 位基于 Unix 时间的第 33 位到第 1 位(以毫秒为单位)。因此,每 2 毫秒有 93 位随机性,高位翻转每 106 年发生一次。COMB_GUID(或类型 4 UUID)的实际物理表示是 128 位的 base64 编码版本,它是一个 22 个字符的字符串。

When inserting in postgres the ratio of speed between a fully random UUID and a COMB _GUID holds as beneficial for the COMB_GUID. The COMB_GUID is 2Xfaster on my hardware over multiple tests, for a one million record test. The records contain the id(22 chars), a string field (110 chars), a double precision, and an INT.

在 postgres 中插入时,完全随机的 UUID 和 COMB_GUID 之间的速度比对 COMB_GUID 有利。对于 100 万条记录测试,COMB_GUID在多次测试中在我的硬件上快了2倍。记录包含 id(22 个字符)、一个字符串字段(110 个字符)、一个双精度和一个 INT。

In ElasticSearch, there is NO discernible difference between the two for indexing. I'm still going to use COMB_GUIDS in case content goes to BTREE indexes anywhere in the chain as the content is fed time related, or can be presorted on the id field so that it IStime related and partially sequential, it will speed up.

在 ElasticSearch 中,两者在索引方面没有明显区别。作为内容被送入时间有关,或可在id字段被预先排序,以便它我仍然会在内容的情况下使用COMB_GUIDS去BTREE索引链中的任何地方IS时间有关,部分连续的,它会加快。

Pretty interesting. The Java code to make a COMB_GUID is below.

非常有趣。生成 COMB_GUID 的 Java 代码如下。

import java.util.Arrays;
import java.util.UUID;
import java.util.Base64; //Only avail in Java 8+
import java.util.Date;

import java.nio.ByteBuffer; 

    private ByteBuffer babuffer = ByteBuffer.allocate( (Long.SIZE/8)*2 );
private Base64.Encoder encoder = Base64.getUrlEncoder();
public  String createId() {
    UUID uuid = java.util.UUID.randomUUID();
        return uuid2base64( uuid );
}

    public String uuid2base64(UUID uuid){ 

        Date date= new Date();
        int intFor32bits;
        synchronized(this){
        babuffer.putLong(0,uuid.getLeastSignificantBits() );
        babuffer.putLong(8,uuid.getMostSignificantBits() );

                long time=date.getTime();
        time=time >> 1; // makes it every 2 milliseconds
                intFor32bits = (int) time; // rolls over every 106 yers + 1 month from epoch
                babuffer.putInt( 0, intFor32bits);

    }
        //does this cause a memory leak?
        return encoder.encodeToString( babuffer.array() );
    }

}

}