在 Java 中生成唯一 ID,以标记日志中的相关条目组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21536572/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Generate unique ID in Java, to label groups of related entries in a log
提问by souser
There are several posts on SO on this topic. Each of those talk about a specific approach so wanted to just get a comparison in one question.
关于这个主题,有几篇关于 SO 的帖子。每个人都在谈论一种特定的方法,因此只想在一个问题中进行比较。
Using new Date() as unique identifier
Generating a globally unique identifier in Java
I am trying to implement a feature where we are able to identify certain events in the log file. These events need to be associated with a unique id. I am trying to come up with a strategy for this unique ID generation. The ID has to have 2 parts : some static information + some dynamic information The logs can be searched for the pattern when debugging of events is needed. I have three ways :
我正在尝试实现一项功能,我们可以在其中识别日志文件中的某些事件。这些事件需要与唯一的 id 相关联。我正在尝试为这种独特的 ID 生成提出一种策略。ID必须有两部分:一些静态信息+一些动态信息需要调试事件时可以搜索日志以查找模式。我有三种方法:
- static info + Joda Date time("abc"+2014-01-30T12:36:12.703)
- static info + Atomic Integer
- static info + UUID
- 静态信息 + Joda 日期时间(“abc”+2014-01-30T12:36:12.703)
- 静态信息 + 原子整数
- 静态信息 + UUID
For the scope of this question, multiple JVMs is not a consideration. I need to generate unique IDs in an efficient manner on one JVM. Also, I will not be able to use a database dependent solution.
对于这个问题的范围,多个 JVM 不是考虑因素。我需要在一个 JVM 上以有效的方式生成唯一 ID。此外,我将无法使用依赖于数据库的解决方案。
Which of the 3 above mentioned strategies works best ?
上述 3 种策略中哪一种最有效?
- If not one from the above, any other strategy ?
- Is the Joda time based strategy robust ? The JVM is single but there will be concurrent users so there can be concurrent events.
- In conjunction with one of the above/other strategies, Do I need to make my method thread-safe / synchronized ?
- 如果不是上述策略,还有其他策略吗?
- Joda 基于时间的策略是否稳健?JVM 是单一的,但会有并发用户,因此可以有并发事件。
- 结合上述/其他策略之一,我是否需要使我的方法线程安全/同步?
采纳答案by Basil Bourque
I have had the same need as you, distinguishing a thread of related entries interleaved with other unrelated entries in a log. I have tried all three of your suggested approaches. My experience was in 4Dnot Java, but similar.
我和你有同样的需求,区分日志中与其他无关条目交错的相关条目线程。我已经尝试了您建议的所有三种方法。我的经验是4D而非 Java,但类似。
Date-Time
约会时间
In my case, I was using a date-time value resolved to whole seconds. That is simply too large a granularity. I easily had collisions where multiple events started within the same second. Damn those speedy computers!
就我而言,我使用的是解析为整秒的日期时间值。这只是一个太大的粒度。我很容易在同一秒内开始发生多个事件的冲突。该死的那些高速电脑!
In your case with either the bundled java.util.Date or Joda-Time(highly recommended for other purposes), both resolve to milliseconds. A millisecond is a long time in modern computers, so I don't recommend this.
在您使用捆绑的 java.util.Date 或Joda-Time(强烈推荐用于其他目的)的情况下,两者都解析为毫秒。一毫秒在现代计算机中是很长的时间,所以我不推荐这样做。
In Java 8, the new java.time.* package(inspired by Joda-Time, defined by JSR 310) resolve to nanoseconds. This might seem to be a better identifier, but no. For one thing, your computer's physical time-keeping clock may not support such a fine resolution. Another is that computers keep getting faster. Lastly, a computer's clock can be reset, indeed it isreset often as computer clocks drift quite a bit. Modern OSes reset their clocks by frequently checking with a time servereither locally or over the Internets.
在 Java 8 中,新的java.time.* 包(受 Joda-Time 启发,由JSR 310定义)解析为纳秒。这似乎是一个更好的标识符,但不是。一方面,您计算机的物理计时时钟可能不支持如此精细的分辨率。另一个原因是计算机越来越快。最后,计算机的时钟可以重置,实际上它是复位经常计算机时钟漂移颇有几分。现代操作系统通过经常检查本地或互联网上的时间服务器来重置其时钟。
Also, logs already have a timestamp, so we are not getting any extra benefit by using a date-time as our identifier. Indeed, having a second date-time in the log entry may actually cause confusion.
此外,日志已经有了时间戳,因此使用日期时间作为我们的标识符并没有得到任何额外的好处。事实上,在日志条目中有第二个日期时间实际上可能会引起混淆。
Serial Number
序列号
By "Atomic Integer", I assume you mean a serial number incrementing to increasing numbers.
通过“原子整数”,我假设您的意思是序列号增加到增加的数字。
This seems overkill for your purpose.
这对于您的目的来说似乎有点过分。
- You don't care about the sequence, it has no meaning for this purpose of grouping log entries. You don't really care if one group came nth number before or after another group.
- Maintaining a sequence is a pain, a point of potential failure. I've always eventually ran into administrative problems with maintaining a sequence.
- 您不关心顺序,它对分组日志条目的这个目的没有意义。您并不真正关心一组是在另一组之前还是之后出现第 n 个数字。
- 保持序列是一种痛苦,是潜在的失败点。我总是最终遇到维护序列的管理问题。
So this approach adds risk without adding any special benefit.
因此,这种方法增加了风险,而没有增加任何特殊的好处。
UUID
用户名
Bingo! Just what you need.
答对了!正是您所需要的。
A UUIDis easily generated, using either the bundled java.util.UUID class' ability to generate Version 3 or 4 UUIDs, or using a third-party library, or accessing the command-line's uuidgen
tool.
甲UUID容易产生,即使用捆绑java.util.UUID中类,生成版本3点或4的UUID能力,或使用第三方库,或访问命令行的uuidgen
工具。
For a very high volume, [Version 1] UUID (MAC+ date-time + random number) would be best. For logging, a Version 4UUID (entirely random) is absolutely acceptable.
对于非常大的数量,[版本 1] UUID(MAC+ 日期时间 + 随机数)将是最好的。对于日志记录,版本 4UUID(完全随机)是绝对可以接受的。
Having a collision is not a realistic concern. Especially for the limited number of values you would be generating for logs. I'm amazed by people who, failing to comprehend the numbers, say they would never replace a sequence with a UUID. Yet when pressed, every single programmer and sysadmin I know has experienced failures with at least one sequence.
发生碰撞并不是一个现实的问题。特别是对于您将为日志生成的有限数量的值。我很惊讶那些无法理解数字的人说他们永远不会用 UUID 替换序列。然而,当我按下时,我认识的每一位程序员和系统管理员都经历过至少一个序列的失败。
No concerns about thread-safety. No concerns about contention (see my test results on another answer of mine).
无需担心线程安全。不用担心争用(请参阅我的另一个答案的测试结果)。
Another benefit of a UUID is that its usual hexadecimalrepresentation, such as:
UUID 的另一个好处是它通常的十六进制表示,例如:
6536ca53-bcad-4552-977f-16945fee13e2
6536ca53-bcad-4552-977f-16945fee13e2
…is easily recognizable. When recognized, the reader immediately knows that string is meant to be a unique identifier. So it's presence in your log is self-documenting.
……很容易辨认。识别后,读者立即知道该字符串是唯一标识符。所以它在你的日志中的存在是自我记录的。
I've found UUIDs to be the Duct Tapeof computing. I keep finding new uses for them.
我发现 UUID 是计算的管道胶带。我一直在寻找它们的新用途。
So, at the start of the code in question, generate a UUID and then embed that into every one of the related log entries.
因此,在相关代码的开头,生成一个 UUID,然后将其嵌入到每个相关的日志条目中。
While the hex string representation of a UUID is hard to read and write, in practice you need only scan a few of the digits at the beginning or end. Or use copy-paste with search and filter features in our modern console tools.
虽然 UUID 的十六进制字符串表示很难读写,但实际上您只需要扫描开头或结尾的几个数字。或者在我们的现代控制台工具中使用带有搜索和过滤功能的复制粘贴。
A few factoids
一些事实
- A UUID is known in the Microsoft world as as a GUID.
- A UUID is nota string, but a 128-bit value. Bits, just bits in memory, "on"/"off" values. Some databases, such as Postgres, know how to handle and store UUID as such 128-bit values. If we wish to show those bits to humans, we could use a series of 128 digits of "1" & "0". But humans do not do well trying to read or write 128 digits of ones and zeros. So we use the hexadecimal representation. But even 32 hex digits is too much for humans, so we break the string into groups separated with hyphens as shown above, for a total of 36 characters.
- The spec for a UUID is quite clear that a hexadecimal representation should be lowercase. The spec says that when creating a UUID from a string input, uppercase should be tolerated. But when generating a hex string, it should be lowercase. Many implementations of UUIDs ignore this requirement. I suggest sticking to the spec and converting your UUID hex strings to lowercase.
- UUID 在 Microsoft 世界中称为GUID。
- UUID不是字符串,而是 128 位值。位,只是内存中的位,“开”/“关”值。某些数据库(例如Postgres)知道如何将 UUID 处理和存储为 128 位值。如果我们希望向人类展示这些位,我们可以使用一系列 128 位的“1”和“0”。但是人类在试图读写 128 位 1 和 0 时表现不佳。所以我们使用十六进制表示。但即使是 32 个十六进制数字对人类来说也太多了,所以我们将字符串分成几组,用连字符分隔,如上所示,总共 36 个字符。
- UUID 的规范非常明确,十六进制表示应该是小写。规范说从字符串输入创建 UUID 时,应该容忍大写。但是在生成十六进制字符串时,它应该是小写的。UUID 的许多实现忽略了这个要求。我建议坚持规范并将您的 UUID 十六进制字符串转换为小写。
MDC – Mapped Diagnostic Context
MDC – 映射诊断上下文
I have not yet used MDC, but want to point it out…
我还没有使用过 MDC,但想指出它......
Some logging frameworks are adding support for this idea of tagging related log entries. Such support is called Mapped Diagnostic Context(MDC). The MDC manages contextual information on a per thread basis.
一些日志框架正在添加对这种标记相关日志条目的想法的支持。这种支持称为映射诊断上下文(MDC)。MDC 基于每个线程管理上下文信息。
A quick introductory article is Log4j MDC (Mapped Diagnostic Context) : What and Why .
一篇快速介绍性文章是Log4j MDC(映射诊断上下文):什么和为什么。
The best logging fa?ade, SLF4J, offers such an MDC feature. The best implementation of that fa?ade, Logback, has a chapter documenting its MDC feature.
最好的日志外观SLF4J提供了这样的 MDC 功能。该外观的最佳实现Logback有一章记录了其 MDC 特性。
回答by DwB
Computers are fast, using time to attemptto create a unique value is going to fail.
计算机速度很快,利用时间来尝试创造独特的价值将会失败。
Instead use a UUID. From the JSE 6.0 UUID API page"[UUID is] A class that represents an immutable universally unique identifier (UUID)."
而是使用 UUID。来自JSE 6.0 UUID API 页面“ [UUID 是] 一个表示不可变的通用唯一标识符 (UUID) 的类。”
Here is some code:
这是一些代码:
import java.util.UUID;
private String id;
id = UUID.randomUUID().toString();
回答by Majid Azimi
I have written a simple service which can generate semi-unique non-sequential 64 bit long numbers. It can be deployed on multiple machines for redundancy and scalability. It use ZeroMQ for messaging. For more information on how it works look at github page: zUID
我编写了一个简单的服务,它可以生成半唯一的非连续 64 位长数字。它可以部署在多台机器上以实现冗余和可扩展性。它使用 ZeroMQ 进行消息传递。有关其工作原理的更多信息,请查看 github 页面:zUID