java UUID的java正则表达式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37615731/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 02:39:09  来源:igfitidea点击:

java regex for UUID

javaregex

提问by Aqura

I want to parse a String which has UUID in the below format

我想解析一个具有以下格式的 UUID 的字符串

"<urn:uuid:4324e9d5-8d1f-442c-96a4-6146640da7ce>"

I have tried it parsing in below way, which works, however I think it would be slow

我已经尝试以下面的方式解析它,这有效,但是我认为它会很慢

private static final String reg1 = ".*?";
private static final String reg2 = "([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})";
private static final Pattern splitter = Pattern.compile(re1 + re2, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);

I am looking for a faster way and tried below, but it fails to match

我正在寻找一种更快的方法并在下面尝试过,但它不匹配

private static final Pattern URN_UUID_PATTERN = Pattern.compile("^< urn:uuid:([^&])+&gt");

I am new to regex. any help is appreciated.

我是正则表达式的新手。任何帮助表示赞赏。

\Aqura

\阿库拉

回答by dlamblin

Your example of a faster regex is using a <where the input is &lt;so that's confusing.

您的更快正则表达式示例使用的<是输入位置,&lt;因此令人困惑。

Regarding speed, first, your UUID is hexadecimal, so don't match with A-Zbut rather a-f. Second you give no indication that case is mixed, so don't use case insensitive and write the correct case in the range.

关于速度,首先,你的 UUID 是十六进制的,所以不要匹配A-Z而是a-f. 其次,您没有给出大小写混合的迹象,因此不要使用不区分大小写的方式并在范围内写入正确的大小写。

You don't explain if you need the part preceding the UUID. If not, don't include .*?, and you may as well write the literals for re1and re2together in your final Pattern. There's no indication you need DOTALL either.

您没有解释是否需要 UUID 前面的部分。如果不是,不包括.*?,你可能也写的文字re1re2你在一起final Pattern。也没有迹象表明您需要 DOTALL。

private static final Pattern splitter =
  Pattern.compile("([a-f0-9]{8}(-[a-f0-9]{4}){4}[a-f0-9]{8})");

Alternatively, if you are measuring your Regular Expression's performance to be too slow, you might try another approach, for example:
Is each uuid preceded by "uuid:" as in your example? If so you can

或者,如果您测量正则表达式的性能太慢,您可以尝试另一种方法,例如:
每个 uuid 前面是否与示例中的“uuid:”一样?如果是这样你可以

  1. find the first index of "uuid:" as i, then
  2. substring 0 to i+5 [assuming you needed it at all], and
  3. substring i+5 to i+41, if I counted that right (36 characters in length).
  1. 找到 "uuid:" 的第一个索引作为i,然后
  2. 子串 0 到i+5 [假设你需要它],和
  3. 子字符串i+5 到i+41,如果我算对了(长度为 36 个字符)。

Along similar lines your faster regex could be:

沿着类似的路线,您更快的正则表达式可能是:

private static final Pattern URN_UUID_PATTERN =
    Pattern.compile("^&lt;urn:uuid:(.{36})&gt;");

OTOH if all your input strings are going to start with those exact characters, no need to do step 1 in the previous suggestion, just input.substring(13, 49);

OTOH 如果您的所有输入字符串都以这些确切字符开头,则无需执行上一个建议中的第 1 步,只需 input.substring(13, 49);

回答by Alexander du Sautoy

If this format don't be changed. I think more fast way is use String.substring() method. Example:

如果这种格式不改变。我认为更快捷的方法是使用 String.substring() 方法。例子:

String val = "&lt;urn:uuid:4324e9d5-8d1f-442c-96a4-6146640da7ce&gt;";
String sUuid = val.substring(13, 49);
UUID uuid =  UUID.fromString(sUuid);

Inside class String used char array for store data, in package java.lang.String:

在类 String 内部使用字符数组存储数据,在包 java.lang.String 中:

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
...
113: /** The value is used for character storage. */
114: private final char value[];
...
}

Method 'String substring(int beginIndex, int endIndex)' make the copy of array elements, from start to end index, and create new String on basis new array. Copying of array it is a very fast operation.

方法 'String substring(int beginIndex, int endIndex)' 复制数组元素,从开始到结束索引,并在新数组的基础上创建新字符串。复制数组是一个非常快的操作。