Java 分布式序列号生成?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2671858/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Distributed sequence number generation?
提问by Jon
I've generally implemented sequence number generationusing database sequences in the past.
我过去通常使用数据库序列实现序列号生成。
e.g. Using Postgres SERIAL type http://www.neilconway.org/docs/sequences/
例如使用 Postgres SERIAL 类型http://www.neilconway.org/docs/sequences/
I'm curious though as how to generate sequence numbers for large distributed systems where there is no database. Does anybody have any experience or suggestions of a best practice for achieving sequence number generation in a thread safemanner for multiple clients?
我很好奇如何为没有数据库的大型分布式系统生成序列号。是否有人对以线程安全的方式为多个客户端实现序列号生成的最佳实践有任何经验或建议?
回答by Steven Schlansker
You could have each node have a unique ID (which you may have anyway) and then prepend that to the sequence number.
您可以让每个节点都有一个唯一的 ID(无论如何您都可以拥有),然后将其添加到序列号之前。
For example, node 1 generates sequence 001-00001 001-00002 001-00003 etc. and node 5 generates 005-00001 005-00002
例如,节点 1 生成序列 001-00001 001-00002 001-00003 等,节点 5 生成序列 005-00001 005-00002
Unique :-)
独特的 :-)
Alternately if you want some sort of a centralized system, you could consider having your sequence server give out in blocks. This reduces the overhead significantly. For example, instead of requesting a new ID from the central server for each ID that must be assigned, you request IDs in blocks of 10,000 from the central server and then only have to do another network request when you run out.
或者,如果您想要某种集中式系统,您可以考虑让您的序列服务器分块分发。这显着降低了开销。例如,您不必为每个必须分配的 ID 向中央服务器请求一个新 ID,而是以 10,000 个块为单位向中央服务器请求 ID,然后只需在用完时再进行一次网络请求。
回答by Phil
Why not use a (thread safe) UUID generator?
为什么不使用(线程安全的)UUID 生成器?
I should probably expand on this.
我可能应该对此进行扩展。
UUIDs are guaranteed to be globally unique (if you avoid the ones based on random numbers, where the uniqueness is just highly probable).
UUID 保证是全局唯一的(如果您避免使用基于随机数的 UUID,其中唯一性是很有可能的)。
Your "distributed" requirement is met, regardless of how many UUID generators you use, by the global uniqueness of each UUID.
无论您使用多少个 UUID 生成器,每个 UUID 的全局唯一性都能满足您的“分布式”要求。
Your "thread safe" requirement can be met by choosing "thread safe" UUID generators.
您可以通过选择“线程安全”UUID 生成器来满足“线程安全”要求。
Your "sequence number" requirement is assumed to be met by the guaranteed global uniqueness of each UUID.
假设每个 UUID 的保证全局唯一性满足您的“序列号”要求。
Note that many database sequence number implementations (e.g. Oracle) do not guarantee either monotonically increasing, or (even) increasing sequence numbers (on a per "connection" basis). This is because a consecutive batch of sequence numbers gets allocated in "cached" blocks on a per connection basis. This guarantees global uniqueness andmaintains adequate speed. But the sequence numbers actually allocated (over time) can be jumbled when there are being allocated by multiple connections!
请注意,许多数据库序列号实现(例如 Oracle)不保证单调递增或(甚至)递增序列号(基于每个“连接”)。这是因为在每个连接的基础上在“缓存”块中分配了连续批次的序列号。这保证了全局唯一性并保持足够的速度。但是当有多个连接分配时,实际分配的序列号(随着时间的推移)可能会混乱!
回答by wsorenson
If it really has to be globally sequential, and not simply unique, then I would consider creating a single, simple service for dispensing these numbers.
如果它真的必须是全局顺序的,而不仅仅是唯一的,那么我会考虑创建一个单一的、简单的服务来分配这些数字。
Distributed systems rely on lots of little services interacting, and for this simple kind of task, do you really need or would you really benefit from some other complex, distributed solution?
分布式系统依赖于大量交互的小服务,对于这种简单的任务,您是否真的需要或真的会从其他复杂的分布式解决方案中受益?
回答by Javier
There are a few strategies; but none that i know can be really distributed and give a real sequence.
有几种策略;但我所知道的没有一个可以真正分布并给出真正的序列。
- have a central number generator. it doesn't have to be a big database.
memcached
has a fast atomic counter, in the vast majority of cases it's fast enough for your entire cluster. - separate an integer range for each node (like Steven Schlanskter's answer)
- use random numbers or UUIDs
- use some piece of data, together with the node's ID, and hash it all (or hmacit)
- 有一个中央数字生成器。它不必是一个大数据库。
memcached
有一个快速的原子计数器,在绝大多数情况下,它对整个集群来说足够快。 - 为每个节点分隔一个整数范围(如Steven Schlanskter 的回答)
- 使用随机数或 UUID
- 使用一些数据,连同节点的 ID,并将其全部散列(或hmac)
personally, i'd lean to UUIDs, or memcached if i want to have a mostly-contiguous space.
就个人而言,如果我想拥有一个大部分连续的空间,我会倾向于 UUID 或 memcached。
回答by Paolo
Now there are more options.
现在有更多选择。
Though this question is "old", I got here, so I think it might be useful to leave the options I know of (so far):
虽然这个问题很“老”,但我到了这里,所以我认为留下我所知道的选项(到目前为止)可能会有用:
- You could try Hazelcast. In it's 1.9 release it includes a Distributed implementation of java.util.concurrent.AtomicLong
- You can also use Zookeeper. It provides methods for creating sequence nodes (appended to znode names, though I prefer using version numbers of the nodes). Be careful with this one though: if you don't want missed numbers in your sequence, it may not be what you want.
- 你可以试试Hazelcast。在它的 1.9 版本中,它包括 java.util.concurrent.AtomicLong 的分布式实现
- 您也可以使用Zookeeper。它提供了创建序列节点的方法(附加到 znode 名称,但我更喜欢使用节点的版本号)。不过要小心这个:如果你不想在你的序列中遗漏数字,它可能不是你想要的。
Cheers
干杯
回答by Jesper M
OK, this is a very old question, which I'm first seeing now.
好的,这是一个非常古老的问题,我现在第一次看到。
You'll need to differentiate between sequence numbersand unique IDsthat are (optionally) loosely sortable by a specific criteria (typically generation time). True sequence numbers imply knowledge of what all other workers have done, and as such require shared state. There is no easy way of doing this in a distributed, high-scale manner. You could look into things like network broadcasts, windowed ranges for each worker, and distributed hash tables for unique worker IDs, but it's a lot of work.
您需要区分序列号和可(可选)根据特定条件(通常是生成时间)松散排序的唯一 ID。真正的序列号意味着知道所有其他工人做了什么,因此需要共享状态。以分布式、大规模的方式做到这一点并不容易。您可以查看网络广播、每个工作人员的窗口范围以及唯一工作人员 ID 的分布式哈希表等内容,但这需要大量工作。
Unique IDs are another matter, there are several good ways of generating unique IDs in a decentralized manner:
唯一 ID 是另一回事,有几种以分散方式生成唯一 ID 的好方法:
a) You could use Twitter's Snowflake ID network service.Snowflake is a:
a) 您可以使用Twitter 的 Snowflake ID 网络服务。雪花是一个:
- Networked service, i.e. you make a network call to get a unique ID;
- which produces 64 bit unique IDs that are ordered by generation time;
- and the service is highly scalable and (potentially) highly available; each instance can generate many thousand IDs per second, and you can run multiple instances on your LAN/WAN;
- written in Scala, runs on the JVM.
- 网络服务,即您拨打网络电话以获得唯一ID;
- 产生按生成时间排序的 64 位唯一 ID;
- 并且该服务具有高度的可扩展性和(潜在的)高度可用性;每个实例每秒可以生成数千个 ID,并且您可以在 LAN/WAN 上运行多个实例;
- 用 Scala 编写,在 JVM 上运行。
b) You could generate the unique IDs on the clients themselves, using an approach derived from how UUIDsand Snowflake's IDs are made.There are multiple options, but something along the lines of:
b) 您可以使用源自UUID和 Snowflake 的 ID的方法的方法在客户端本身上生成唯一 ID 。有多种选择,但大致如下:
The most significant 40 or so bits: A timestamp;the generation time of the ID. (We're using the most significant bits for the timestamp to make IDs sort-able by generation time.)
The next 14 or so bits: A per-generator counter,which each generator increments by one for each new ID generated. This ensures that IDs generated at the same moment (same timestamps) do not overlap.
The last 10 or so bits: A unique value for each generator.Using this, we don't need to do any synchronization between generators (which is extremely hard), as all generators produce non-overlapping IDs because of this value.
最重要的 40 位左右:时间戳;ID的生成时间。(我们使用时间戳的最高位来使 ID 可以按生成时间排序。)
接下来的 14 位左右:每个生成器的计数器,每个生成器为生成的每个新 ID 递增 1。这确保在同一时刻(相同时间戳)生成的 ID 不会重叠。
最后 10 位左右:每个生成器的唯一值。使用这个,我们不需要在生成器之间做任何同步(这是非常困难的),因为所有生成器都会因为这个值产生不重叠的 ID。
c) You could generate the IDs on the clients, using just a timestamp and random value.This avoids the need to know all generators, and assign each generator a unique value. On the flip side, such IDs are not guaranteedto be globally unique, they're only very highly likelyto be unique. (To collide, one or more generators would have to create the same random value at the exact same time.) Something along the lines of:
c) 您可以仅使用时间戳和随机值在客户端上生成 ID 。这避免了需要知道所有生成器,并为每个生成器分配一个唯一值。另一方面,这样的 ID 不能保证是全局唯一的,它们只是很有可能是唯一的。(要发生碰撞,一个或多个生成器必须在完全相同的时间创建相同的随机值。)大致如下:
- The most significant 32 bits: Timestamp,the generation time of the ID.
- The least significant 32 bits: 32-bits of randomness,generated anew for each ID.
- 最高32位:Timestamp,ID的生成时间。
- 最低有效 32 位:随机性的 32 位,为每个 ID 重新生成。
d) The easy way out, use UUIDs / GUIDs.
d) 最简单的方法,使用 UUIDs / GUIDs。
回答by Nikita Koksharov
It can be done with Redisson. It implements distributed and scalable version of AtomicLong
. Here is example:
它可以用Redisson来完成。它实现了AtomicLong
. 这是示例:
Config config = new Config();
config.addAddress("some.server.com:8291");
Redisson redisson = Redisson.create(config);
RAtomicLong atomicLong = redisson.getAtomicLong("anyAtomicLong");
atomicLong.incrementAndGet();
回答by Majid Azimi
I have written a simple service which can generate semi-unique non-sequential 64 bit long numbers. It can be deployed on multiple machines for redundancy and scalability. It use ZeroMQ for messaging. For more information on how it works look at github page: zUID
我编写了一个简单的服务,它可以生成半唯一的非连续 64 位长数字。它可以部署在多台机器上以实现冗余和可扩展性。它使用 ZeroMQ 进行消息传递。有关其工作原理的更多信息,请查看 github 页面:zUID
回答by user2108278
Using a database you can reach 1.000+ increments per second with a single core. It is pretty easy. You can use its own database as backend to generate that number (as it should be its own aggregate, in DDD terms).
使用数据库,您可以以单核达到每秒 1.000+ 的增量。这很容易。您可以使用它自己的数据库作为后端来生成该数字(因为它应该是它自己的聚合,在 DDD 术语中)。
I had what seems a similar problem. I had several partitions and I wanted to get an offset counter for each one. I implemented something like this:
我遇到了类似的问题。我有几个分区,我想为每个分区获取一个偏移计数器。我实现了这样的事情:
CREATE DATABASE example;
USE example;
CREATE TABLE offsets (partition INTEGER, offset LONG, PRIMARY KEY (partition));
INSERT offsets VALUES (1,0);
Then executed the following statement:
然后执行了以下语句:
SELECT @offset := offset from offsets WHERE partition=1 FOR UPDATE;
UPDATE offsets set offset=@offset+1 WHERE partition=1;
If your application allows you, you can allocate a block at once (that was my case).
如果您的应用程序允许,您可以一次分配一个块(这就是我的情况)。
SELECT @offset := offset from offsets WHERE partition=1 FOR UPDATE;
UPDATE offsets set offset=@offset+100 WHERE partition=1;
If you need further throughput an cannot allocate offsets in advance you can implement your own service using Flink for real time processing. I was able to get around 100K increments per partition.
如果您需要更高的吞吐量并且无法提前分配偏移量,您可以使用 Flink 实现您自己的服务进行实时处理。我能够获得每个分区大约 100K 的增量。
Hope it helps!
希望能帮助到你!