C++11 的原始字符串文字 R"(...)" 中括号的基本原理是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19075999/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the rationale for parenthesis in C++11's raw string literals R"(...)"?
提问by Mikhail
There is a very convenient feature introduced in C++11 called raw string literals, which are strings with no escape characters. And instead of writing this:
C++11 中引入了一个非常方便的特性,称为原始字符串文字,它是没有转义字符的字符串。而不是写这个:
regex mask("\t[0-9]+\.[0-9]+\t\\SUB");
You can simply write this:
你可以简单地写这个:
regex mask(R"(\t[0-9]+\.[0-9]+\t\SUB)");
Quite more readable. However, note extra parenthesis around the string one have to place to define a raw string literal.
更易读。但是,请注意字符串周围的额外括号必须放置以定义原始字符串文字。
My question is, why do we even need these? For me it looks quite ugly and illogical. Here are the cons what I see:
我的问题是,为什么我们甚至需要这些?对我来说,它看起来很丑陋和不合逻辑。以下是我所看到的缺点:
- Extra verbosity, while the whole feature is used to make literals more compact
- Hard to distinguish between the body of the literal and the defining symbols
- 额外的冗长,而整个功能用于使文字更紧凑
- 难以区分文字的主体和定义的符号
That's what I mean by the hard distinguishing:
这就是我所说的难以区分的意思:
"good old usual string literal"
^- body inside quotes -^
R"(new strange raw string literal)"
^- body inside parenthesis -^
And here is the pro:
这是专业人士:
- More flexibility, more characters available in raw strings, especially when used with the delimiter:
"delim( can use "()" here )delim"
- 更大的灵活性,原始字符串中可用的字符更多,尤其是与分隔符一起使用时:
"delim( can use "()" here )delim"
But hey, if you need more flexibility, you have old good escapeable string literals. Why the standard committee decided to pollute the content of every raw string literal with these absolutely unnecessary parenthesis? What was the rationale behind that? What are the pros I didn't mention?
但是,嘿,如果您需要更大的灵活性,您可以使用旧的良好的可转义字符串文字。为什么标准委员会决定用这些绝对不必要的括号污染每个原始字符串文字的内容?这背后的理由是什么?我没有提到的优点是什么?
UPDThe answer by Kerrek is great, but it is not an answer, unfortunately. Since I already described that I understand how it works and what benefits does it give. Five years passed since I've asked this question, and still there is no answer. And I am still frustrated by this decision. One could say that this is a matter of taste, but I would disagree. How many spaces do you use, how do you name your variables, is this SomeFunction()
or some_function()
- this is the matter of taste. And I can really easily switch from one style to another.
UPDKerrek 的回答很好,但不幸的是,这不是一个答案。由于我已经描述过,我了解它的工作原理以及它带来的好处。我问这个问题已经五年了,仍然没有答案。我仍然对这个决定感到沮丧。有人可以说这是一个品味问题,但我不同意。你使用了多少空格,你如何命名你的变量,这是SomeFunction()
或some_function()
- 这是品味问题。而且我真的可以轻松地从一种风格切换到另一种风格。
But this?.. Still feels awkward and clumsy after so many years. No, this is not about the taste. This is about how we want to cover all possible cases no matter what. We doomed to write these ugly parens every time we need to write a Windows-specific path, or a regular expression, or a multi-line string literal. And for what?.. For those rare cases when we actually need to put "
in a string? I wish I was on that committee meeting where they decided to do it this way. And I would be strongly against this really bad decision. I wish. Now we are doomed.
可这个? 这么多年了还是觉得别扭、笨拙。不,这与味道无关。这是关于我们无论如何要涵盖所有可能的情况。每次我们需要编写特定于 Windows 的路径、正则表达式或多行字符串文字时,我们注定要编写这些丑陋的括号。为了什么?..对于那些我们真正需要放入"
字符串的罕见情况?我希望我参加了他们决定这样做的委员会会议。我会强烈反对这个非常糟糕的决定。我希望。现在我们注定了。
Thank you for reading this far. Now I feel a little better.
感谢您阅读到这里。现在我感觉好多了。
UPD2Here are my alternative proposals, which I think both would be MUCH better than existing.
UPD2这是我的替代建议,我认为这两者都比现有的要好得多。
Proposal 1. Inspired by python. Cannot support string literals with triple quotes: R"""Here is a string literal with any content, except for triple quotes, which you don't actually use that often."""
提案 1. 受python启发。不能支持带三引号的字符串文字:R"""Here is a string literal with any content, except for triple quotes, which you don't actually use that often."""
Proposal 2. Inspired by common sense. Supports all possible string literals, just like the current one: R"delim"content of string"delim"
. With empty delimiter: R""Looks better, doesn't it?""
. Empty raw string: R""""
. Raw string with double quotes: R"#"Here are double quotes: "", thanks"#"
.
建议 2. 受常识启发。支持所有可能的字符串文字,就像当前的一样:R"delim"content of string"delim"
. 带空分隔符:R""Looks better, doesn't it?""
. 空的原始字符串:R""""
. 带双引号的原始字符串:R"#"Here are double quotes: "", thanks"#"
.
Any problems with these proposals?
这些提议有问题吗?
采纳答案by Mikhail V
As the other answer explains, there must be something additional to the quotation mark to avoid the parsing ambiguity in cases where "
or )"
, or actually any closing sequence that may appear in the string itself.
正如另一个答案所解释的那样,引号之外必须有一些附加内容,以避免在"
或)"
或实际上可能出现在字符串本身中的任何结束序列的情况下解析歧义。
As for the syntax choice, well, I agree the syntax choice is suboptimal, but it is OK in general (you could think of it: "things could be worse", lol). I think it is a good compromise between usage simplicity and parsing simplicity.
至于语法选择,嗯,我同意语法选择是 次优的,但总的来说还可以(你可以这么想:“事情可能会更糟”,哈哈)。我认为这是使用简单性和解析简单性之间的一个很好的折衷。
Proposal 1. Inspired by python. Cannot support string literals with triple quotes:
R"""any content, except for triple quotes, which you don't actually use that often."""
建议 1。灵感来自蟒蛇。不能支持带三重引号的字符串文字:
R"""任何内容,除了三重引号,你实际上并不经常使用。"""
There is indeed a problem with this - "quotes, which you don't actually use that often". Firstly, the very idea of raw strings is to represent rawstrings, i.e. exactly as they would appear in a text file, without anymodifications to the string, regardless of the string contents. Secondly, the syntax should be general, i.e. without adding variations like "almost raw string", etc.
这确实存在问题 - “引号,您实际上并不经常使用”。首先,原始字符串的本质是表示原始字符串,即与它们在文本文件中出现的完全一样,无需对字符串进行任何修改,无论字符串内容如何。其次,语法应该是通用的,即不添加“几乎原始字符串”等变体。
How would you write one quote with this syntax? Two quotes? Note - those are very common cases, especially when your code is dealing with strings and parsing.
你会如何用这种语法写一个引号?两个引号?注意 - 这些是非常常见的情况,尤其是当您的代码处理字符串和解析时。
Proposal 2.
R"delim"content of string"delim".
R""Looks better, doesnt it?"".
R"#"Here are double quotes: "", thanks"#".
建议2。
R“delim”字符串“delim”的内容。
R""看起来更好,不是吗?""。
R"#"这里是双引号:"",谢谢"#"。
Well, this one might be a better candidate. One thing though - a common case (and I believe it was a motivating case for accepted syntax), is that the double-quote character itself is verycommon and raw strings should come in handy for these cases.
嗯,这个人可能是一个更好的候选人。不过有一件事 - 一个常见的情况(我相信这是接受语法的一个激励案例),是双引号字符本身非常常见,原始字符串应该在这些情况下派上用场。
So, lets see, normal string syntax:
所以,让我们看看,正常的字符串语法:
s1 = "\"";
s2 = "\"quoted string\"";
Your syntax e.g. with "x" as delim:
您的语法例如以“x”作为 delim:
s1 = R"x"""x";
s2 = R"x""quoted string""x";
Accepted syntax:
接受的语法:
s1 = R"(")";
s2 = R"("quoted string")";
Yes, I agree that the brackets introduce some annoying visual effect. So I suspect the authors of the syntax were after the idea that the additional "delim" in this case will be rarely needed, since )"
appears not very often inside a string. But OTOH, trailing/leading/isolated quotes are quite often, so e.g. your proposed syntax (#2) would require some delim
more often, which in turn would require more often changing it from R""..""
to R"delim"..."delim"
. Hope you get the idea.
是的,我同意括号会带来一些烦人的视觉效果。所以我怀疑语法的作者认为在这种情况下很少需要额外的“delim”,因为)"
在字符串中出现的频率不高。但OTOH,拖尾/领先/分离的报价也相当频繁,所以例如,您的建议语法(#2)将需要一些delim
更多的时候,这反过来将需要更经常地对其进行更改R""..""
到R"delim"..."delim"
。希望你明白。
Could the syntax be better? I personally would prefer an even simpler variant of syntax:
语法会更好吗?我个人更喜欢更简单的语法变体:
Rdelim"string contents"delim;
With the above examples:
通过上面的例子:
s1 = Rx"""x;
s2 = Rx""quoted string""x;
However to work correctly (if its possible at all in current grammar), this variant would require limiting the character set for the delim
part, say to letters/digits only (because of existing operators), and maybe some further restrictions for the initial character to avoid clashes with possible future grammar.
So I believe a better choice could have been made, although nothing significantlybetter can be done in this case.
但是,要正常工作(如果在当前语法中完全可能),此变体将需要限制部件的字符集delim
,例如仅对字母/数字(因为现有运算符),并且可能对初始字符进行一些进一步的限制避免与未来可能出现的语法发生冲突。
所以,我认为更好的选择可能已经作出,虽然没有什么显著可以更好地在这种情况下进行。
回答by Kerrek SB
The purpose of the parentheses is to allow you to specify a custom delimiter:
括号的目的是允许您指定自定义分隔符:
R"foo(Hello World)foo" // the string "Hello World"
In your example, and in typical use, the delimiter is simply empty, so the raw string is enclosed by the sequences R"(
and )"
.
在您的示例中,在典型使用中,分隔符只是空的,因此原始字符串由序列R"(
和括起来)"
。
Allowing for arbitrary delimiters is a design decision that reflects the desire to provide a complete solution without weird limitations or edge cases. You can pick anysequence of characters that does not occur in your string as the delimiter.
允许任意分隔符是一种设计决策,它反映了提供完整解决方案的愿望,没有奇怪的限制或边缘情况。您可以选择字符串中未出现的任何字符序列作为分隔符。
Without this, you would be in trouble if the string itself contained something like "
(if you had just wanted R"..."
as your raw string syntax) or )"
(if the delimiter is empty). Both of those are perfectly common and frequent character sequences, especially in regular expressions, so it would be incredibly annoying if the decision whether or not you use a raw string depended on the specific content of your string.
如果没有这个,如果字符串本身包含类似"
(如果您只是想要R"..."
作为原始字符串语法)或)"
(如果分隔符为空)之类的内容,您就会遇到麻烦。这两个都是非常常见和频繁的字符序列,尤其是在正则表达式中,因此如果您决定是否使用原始字符串取决于字符串的特定内容,那将会非常烦人。
Remember that inside the raw string there's no other escape mechanism, so the best you could do otherwise was to concatenate pieces of string literal, which would be very impractical. By allowing a custom delimiter, all you need to do is pick an unusual character sequence once, and maybemodify it in very rare cases when you make a future edit.
请记住,在原始字符串内部没有其他转义机制,因此您可以做的最好的事情是连接字符串文字片段,这是非常不切实际的。通过允许自定义分隔符,所有你需要做的是一次挑一个不寻常的字符序列,并可能修改它在极少数情况下,当你犯了一个未来的编辑。
But to stress once again, even the empty delimiter is already useful, since the R"(...)"
syntax allows you to place naked quotation marks in your string. That by itself is quite a gain.
但是再次强调,即使是空分隔符也已经很有用,因为R"(...)"
语法允许您在字符串中放置裸引号。这本身就是一个很大的收获。