database 什么是 Hi/Lo 算法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/282099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:04:13  来源:igfitidea点击:

What's the Hi/Lo algorithm?

databasealgorithmnhibernatehibernatehilo

提问by DiegoCofre

What's the Hi/Lo algorithm?

什么是 Hi/Lo 算法?

I've found this in the NHibernatedocumentation (it's one method to generate unique keys, section 5.1.4.2), but I haven't found a good explanation of how it works.

我在NHibernate文档中找到了这个(它是一种生成唯一键的方法,第 5.1.4.2 节),但我还没有找到关于它如何工作的很好的解释。

I know that Nhibernate handles it, and I don't need to know the inside, but I'm just curious.

我知道 Nhibernate 处理它,我不需要知道内部,但我只是好奇。

回答by Jon Skeet

The basic idea is that you have two numbers to make up a primary key- a "high" number and a "low" number. A client can basically increment the "high" sequence, knowing that it can then safely generate keys from the entire range of the previous "high" value with the variety of "low" values.

基本思想是你有两个数字来组成一个主键——一个“高”数字和一个“低”数字。客户端基本上可以增加“高”序列,知道然后它可以安全地从先前“高”值的整个范围和各种“低”值生成密钥。

For instance, supposing you have a "high" sequence with a current value of 35, and the "low" number is in the range 0-1023. Then the client can increment the sequence to 36 (for other clients to be able to generate keys while it's using 35) and know that keys 35/0, 35/1, 35/2, 35/3... 35/1023 are all available.

例如,假设您有一个当前值为 35 的“高”序列,而“低”数在 0-1023 的范围内。然后客户端可以将序列增加到 36(其他客户端在使用 35 时能够生成密钥)并且知道密钥 35/0、35/1、35/2、35/3...35/1023 是所有可用。

It can be very useful (particularly with ORMs) to be able to set the primary keys on the client side, instead of inserting values without primary keys and then fetching them back onto the client. Aside from anything else, it means you can easily make parent/child relationships and have the keys all in place before you do anyinserts, which makes batching them simpler.

能够在客户端设置主键而不是插入没有主键的值然后将它们取回客户端是非常有用的(尤其是对于 ORM)。除此之外,这意味着您可以轻松地建立父/子关系并在执行任何插入之前将所有键都准备好,这使得批处理更简单。

回答by Stephan Eggermont

In addition to Jon's answer:

除了乔恩的回答:

It is used to be able to work disconnected. A client can then ask the server for a hi number and create objects increasing the lo number itself. It does not need to contact the server until the lo range is used up.

它用于能够在断开连接的情况下工作。然后,客户端可以向服务器询问 hi 编号并创建增加 lo 编号本身的对象。在 lo 范围用完之前,它不需要联系服务器。

回答by Vlad Mihalcea

Since this is a very common question, I wrote this article, on which this answer is based on.

由于这是一个非常常见的问题,我写了这篇文章,这个答案是基于它。

The hi/lo algorithms splits the sequences domain into “hi” groups. A “hi” value is assigned synchronously. Every “hi” group is given a maximum number of “lo” entries, that can by assigned off-line without worrying about concurrent duplicate entries.

hi/lo 算法将序列域分成“hi”组。“hi”值是同步分配的。每个“hi”组都有最大数量的“lo”条目,可以离线分配,而不必担心并发重复条目。

  1. The “hi” token is assigned by the database, and two concurrent calls are guaranteed to see unique consecutive values
  2. Once a “hi” token is retrieved we only need the “incrementSize” (the number of “lo” entries)
  3. The identifiers range is given by the following formula:

    [(hi -1) * incrementSize) + 1, (hi * incrementSize) + 1)
    

    and the “lo” value will be in the range:

    [0, incrementSize)
    

    being applied from the start value of:

    [(hi -1) * incrementSize) + 1)
    
  4. When all “lo” values are used, a new “hi” value is fetched and the cycle continues

  1. “hi”令牌由数据库分配,保证两个并发调用看到唯一的连续值
  2. 一旦检索到“hi”标记,我们只需要“incrementSize”(“lo”条目的数量)
  3. 标识符范围由以下公式给出:

    [(hi -1) * incrementSize) + 1, (hi * incrementSize) + 1)
    

    并且“lo”值将在以下范围内:

    [0, incrementSize)
    

    从以下的起始值开始应用:

    [(hi -1) * incrementSize) + 1)
    
  4. 当使用所有“lo”值时,获取新的“hi”值并继续循环

You can find a more detailed explanation in this article:

你可以在这篇文章中找到更详细的解释:

And this visual presentation is easy to follow as well:

这种视觉呈现也很容易理解:

enter image description here

在此处输入图片说明

While hi/lo optimizer is fine for optimizing identifier generation, it doesn't play well with other systems inserting rows into our database, without knowing anything about our identifier strategy.

虽然 hi/lo 优化器可以很好地优化标识符生成,但它不适用于将行插入我们数据库的其他系统,而无需了解我们的标识符策略。

Hibernate offers the pooled-looptimizer, which offers the advantages of the hi/lo generator strategy while also providing interoperability with other 3rd-party clients that are not aware of this sequence allocation strategy.

Hibernate 提供了pooled-lo优化器,它提供了 hi/lo 生成器策略的优点,同时还提供了与其他不知道这种序列分配策略的 3rd-party 客户端的互操作性。

Being both efficient and interoperable with other systems, the pooled-lo optimizer is a much better candidate than the legacy hi/lo identifier strategy.

由于既高效又可与其他系统互操作,pooled-lo 优化器是比传统的 hi/lo 标识符策略更好的候选者。

回答by Thomas W

Lo is a cached allocator that splits the keyspace into large chunks, typically based on some machine word size, rather than the meaningfully-sized ranges (eg obtaining 200 keys at a time) which a human might sensibly choose.

Lo 是一个缓存分配器,它将键空间分成大块,通常基于一些机器字的大小,而不是人类可能明智地选择的有意义的大小范围(例如,一次获得 200 个键)。

Hi-Lo usage tends to waste large numbers of keys on server restart, and generate large human-unfriendly key values.

Hi-Lo 的使用往往会在服务器重启时浪费大量密钥,并生成大量对人类不友好的密钥值。

Better than the Hi-Lo allocator, is the "Linear Chunk" allocator. This uses a similar table-based principle but allocates small, conveniently-sized chunks & generates nice human-friendly values.

比 Hi-Lo 分配器更好的是“线性块”分配器。这使用了类似的基于表的原则,但分配了小而方便的块并生成了很好的人性化值。

create table KEY_ALLOC (
    SEQ varchar(32) not null,
    NEXT bigint not null,
    primary key (SEQ)
);

To allocate the next, say, 200 keys (which are then held as a range in the server & used as needed):

要分配下一个,比如说,200 个键(然后在服务器中作为一个范围保存并根据需要使用):

select NEXT from KEY_ALLOC where SEQ=?;
update KEY_ALLOC set NEXT=(old value+200) where SEQ=? and NEXT=(old value);

Providing you can commit this transaction (use retries to handle contention), you have allocated 200 keys & can dispense them as needed.

如果您可以提交此事务(使用重试来处理争用),您就分配了 200 个密钥并可以根据需要分配它们。

With a chunk-size of just 20, this scheme is 10x faster than allocating from an Oracle sequence, and is 100% portable amongst all databases. Allocation performance is equivalent to hi-lo.

该方案的块大小仅为 20,比从 Oracle 序列分配快 10 倍,并且在所有数据库之间具有 100% 的可移植性。分配性能相当于hi-lo。

Unlike Ambler's idea, it treats the keyspace as a contiguous linear numberline.

与 Ambler 的想法不同,它将键空间视为连续的线性数轴。

This avoids the impetus for composite keys (which were never really a good idea) and avoids wasting entire lo-words when the server restarts. It generates "friendly", human-scale key values.

这避免了对复合键的推动(这从来都不是一个好主意)并避免在服务器重新启动时浪费整个 lo-words。它生成“友好”的、人性化的关键值。

Mr Ambler's idea, by comparison, allocates the high 16- or 32-bits, and generates large human-unfriendly key values as the hi-words increment.

相比之下,Ambler 先生的想法是分配高 16 位或 32 位,并随着高字的增加生成对人类不友好的大键值。

Comparison of allocated keys:

分配键的比较:

Linear_Chunk       Hi_Lo
100                65536
101                65537
102                65538
.. server restart
120                131072
121                131073
122                131073
.. server restart
140                196608

Design-wise, his solution is fundamentally more complex on the number-line (composite keys, large hi_word products) than Linear_Chunk while achieving no comparative benefit.

在设计方面,他的解决方案在数轴(复合键、大型 hi_word 产品)上比 Linear_Chunk 更复杂,同时没有实现比较优势。

The Hi-Lo design arose early in OO mapping and persistence. These days persistence frameworks such as Hibernate offer simpler and better allocators as their default.

Hi-Lo 设计出现在 OO 映射和持久性的早期。如今,诸如 Hibernate 之类的持久性框架提供了更简单、更好的分配器作为其默认设置。

回答by Theo

I found the Hi/Lo algorithm is perfect for multiple databases with replication scenarios based in my experience. Imagine this. you have a server in New York (alias 01) and another server in Los Angeles (alias 02) then you have a PERSON table... so in New York when a person is create... you always use 01 as the HI value and the LO value is the next secuential. por example.

根据我的经验,我发现 Hi/Lo 算法非常适合具有复制场景的多个数据库。想象一下。你在纽约有一台服务器(别名 01),另一台服务器在洛杉矶(别名 02),那么你有一个 PERSON 表......所以在纽约创建一个人时......你总是使用 01 作为 HI 值LO 值是下一个顺序。例子。

  • 010000010 Jason
  • 010000011 David
  • 010000012 Theo
  • 010000010 杰森
  • 010000011 大卫
  • 010000012 西奥

in Los Angeles you always use the HI 02. for example:

在洛杉矶,您总是使用 HI 02。例如:

  • 020000045 Rupert
  • 020000046 Oswald
  • 020000047 Mario
  • 020000045 鲁伯特
  • 020000046 奥斯瓦尔德
  • 020000047 马里奥

So, when you use the database replication (no matter what brand) all the primary keys and data combine easily and naturally without to worry about duplicate primary keys, collissions, etc.

因此,当您使用数据库复制(无论什么品牌)时,所有主键和数据都可以轻松自然地组合在一起,而无需担心重复的主键、冲突等。

This is the best way to go in this scenario.

这是在这种情况下最好的方法。