Java 中的原始字符串 - 特别是正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1256667/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Raw Strings in Java - for regex in particular
提问by PlagueHammer
Is there any way to use raw strings in Java (without escape sequences)?
有没有办法在 Java 中使用原始字符串(没有转义序列)?
(I'm writing a fair amount of regex code and raw strings would make my code immensely more readable)
(我正在编写大量正则表达式代码和原始字符串将使我的代码更具可读性)
I understand that the language does not provide this directly, but is there any way to "simulate" them in any way whatsoever?
我知道该语言不直接提供这一点,但是有没有办法以任何方式“模拟”它们?
采纳答案by stevedbrown
No, there isn't.
不,没有。
Generally, you would put raw strings and regexes in a properties file, but those have some escape sequence requirements too.
通常,您会将原始字符串和正则表达式放在属性文件中,但它们也有一些转义序列要求。
回答by jsight
No (quite sadly).
不(很遗憾)。
回答by Esko
String#getBytes()exposes a copy of the internal byte array contained in every single String object which actually contains the 16-bit UTF-16 encoded String - the byte array will contain the same string converted to match the platform's default charset. What I'm saying is that I think this is as close to "raw" string as you can ever get in Java.
String#getBytes()公开包含在每个 String 对象中的内部字节数组的副本,该对象实际上包含 16 位 UTF-16 编码字符串 - 字节数组将包含转换为匹配平台默认字符集的相同字符串。我要说的是,我认为这与您在 Java 中所能获得的“原始”字符串一样接近。
回答by ShabbyDoo
You could write your own, non-escaped property reader and put your strings in a resource file.
您可以编写自己的非转义属性读取器并将字符串放入资源文件中。
回答by Bill K
I personally consider regex strings data and not code, so I don't like them in my code--but I realize that's impractical and unpopular (Yes, I realize it, you don't have to yell at me).
我个人认为正则表达式字符串数据而不是代码,所以我不喜欢它们在我的代码中——但我意识到这是不切实际和不受欢迎的(是的,我意识到这一点,你不必对我大喊大叫)。
Given that there is no native way to do this, I can come up with two possibilities (well, three but the third is, umm, unnatural).
鉴于没有本地方法可以做到这一点,我可以提出两种可能性(好吧,三种但第三种是,嗯,不自然)。
So my personal preference would be to just parse a file into strings. You could name each entry in the file and load them all into a hash table for easy access from your code.
所以我个人的偏好是将文件解析为字符串。您可以命名文件中的每个条目并将它们全部加载到哈希表中,以便从您的代码中轻松访问。
Second choice, create a file that will be pre-processed into a java interface; it could escape the regex as it does so. Personally I hate code generation, but if the java file is 100% never human edited, it's not too bad (the real evil is generated files that you are expected to edit!)
第二种选择,创建一个将被预处理成java接口的文件;它可以在这样做时逃避正则表达式。我个人讨厌代码生成,但是如果 java 文件 100% 从未人工编辑过,那还不错(真正的邪恶是您希望编辑的生成文件!)
Third (tricky and probably a bad idea): You might be able to create a custom doclet that will extract strings from your comments into a text file or a header file at compile time, then use one of the other two methods above. This keeps your strings in the same file in which they are being used. This could be really hard to do correctly, and the penalties of failure are extreme, so I wouldn't even consider it unless I had an overwhelming need and some pretty impressive talent.
第三(棘手且可能是个坏主意):您可以创建一个自定义 doclet,在编译时将注释中的字符串提取到文本文件或头文件中,然后使用上述其他两种方法之一。这会将您的字符串保存在使用它们的同一文件中。这可能真的很难正确地做,而且失败的惩罚是极端的,所以我什至不会考虑它,除非我有压倒性的需求和一些非常令人印象深刻的才能。
I only suggest this because comments are free-form and things within a "pre" tag are pretty safe from formatters and other system uglies. The doclet could extract this before printing the javadocs, and could even add some of the generated javadocs indicating your use of regex strings.
我只建议这样做是因为评论是自由形式的,并且“pre”标签中的内容对于格式化程序和其他系统丑闻来说非常安全。doclet 可以在打印 javadoc 之前提取它,甚至可以添加一些生成的 javadoc,指示您使用了正则表达式字符串。
Before downvoting and telling me this is a stupid idea--I KNOW, I just thought I'd suggest it because it's interesting, but my preference as I stated above is a simple text file...
在否决并告诉我这是一个愚蠢的想法之前 - 我知道,我只是想我会建议它,因为它很有趣,但我上面所说的偏好是一个简单的文本文件......
回答by Thorbj?rn Ravn Andersen
Have the raw text file in your class path and read it in with getResourceAsStream(....)
在类路径中包含原始文本文件并使用 getResourceAsStream(....)
回答by mk.
( Properties filesare common, but messy - I treat most regex as code, and keep it where I can refer to it, and you should too. As for the actual question: )
(属性文件很常见,但很乱 - 我将大多数正则表达式视为代码,并将其保存在我可以引用的地方,您也应该这样做。至于实际问题:)
Yes, there are ways to get around the poor readability. You might try:
是的,有一些方法可以解决可读性差的问题。你可以试试:
String s = "crazy escaped garbage"; //readable version//
though this requires care when updating. Eclipse has an option that lets you paste text in between quotes, and the escape sequences are applied for you. The tactic would be to edit the readable versions first, and then delete the garbage, and paste them in between the empty quotes "".
虽然这在更新时需要小心。Eclipse 有一个选项,可让您在引号之间粘贴文本,并为您应用转义序列。策略是先编辑可读版本,然后删除垃圾,并将它们粘贴在空引号“”之间。
Idea time:
创意时间:
Hack your editor to convert them; release as a plugin. I checked around for plugins, but found none (try searching though). There's a one-to-one correspondence between escaped source strings and textbox text (discounting \n, \r\n). Perhaps highlighted text with two quotes on the ends could be used.
破解你的编辑器来转换它们;作为插件发布。我检查了插件,但没有找到(尽管尝试搜索)。转义的源字符串和文本框文本之间存在一一对应的关系(折扣 \n,\r\n)。也许可以使用两端带有两个引号的突出显示文本。
String s = "##########
#####";
where # is any character, which is highlighted - the break is treated as a newline. Text typed or pasted within the highlighted area are escaped in the 'real' source, and displayed as if they were not. (In the same way that Eclipse escapes pasted text, this would escape typed text, and also display it without the backslashes.) Delete one of the quotes to cause a syntax error if you want to edit normally. Hmm.
其中 # 是任何字符,突出显示 - 中断被视为换行符。在突出显示区域内键入或粘贴的文本在“真实”源中被转义,并显示为好像它们不是。(与 Eclipse 对粘贴的文本进行转义的方式相同,这将转义键入的文本,并在不带反斜杠的情况下显示它。)如果要正常编辑,请删除其中一个引号以导致语法错误。唔。
回答by Dread
This is a work-around if you are using eclipse. You can automatically have long blocks of text correctly multilined and special characters automatically escaped when you paste text into a string literal
如果您使用 eclipse,这是一种解决方法。当您将文本粘贴到字符串文字中时,您可以自动将长文本块正确多行并自动转义特殊字符
"-paste here-";
"-粘贴在这里-";
if you enable that option in window→preferences→java→Editor→Typing→"Escape text when pasting into a string literal"
如果您在窗口→首选项→java→编辑器→打字→“粘贴到字符串文字时转义文本”中启用该选项
回答by ismailsunni
I use Pattern.quote. And it solves the problem of the question. Thusly:
我使用Pattern.quote。它解决了问题的问题。因此:
Pattern pattern = Pattern.compile(Pattern.quote("\r\n?|\n"));
The quote method returns a string that would match the provided string argument, which the return string is the properly quoted string for our case.
quote 方法返回一个与提供的字符串参数匹配的字符串,返回字符串是我们案例中正确引用的字符串。
回答by Michael Scheper
No. But there's an IntelliJ plug-in that makes this easier to deal with, called String Manipulation.
不是。但是有一个 IntelliJ 插件可以让这个更容易处理,叫做String Manipulation。
IntelliJ will also automatically escape a string pasted into it. (As @Dread points out, Eclipse has a plug-in to enable this.)
IntelliJ 还会自动转义粘贴到其中的字符串。(正如@Dread指出的那样,Eclipse 有一个插件可以实现这一点。)