C++ 不可变字符串与 std::string
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2916358/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
immutable strings vs std::string
提问by deft_code
I've recent been reading about immutable strings Why can't strings be mutable in Java and .NET?and Why .NET String is immutable?as well some stuff about why Dchose immutable strings. There seem to be many advantages.
我最近一直在阅读有关不可变字符串的内容为什么字符串在 Java 和 .NET 中不能是可变的?而为什么.NET字符串是不可改变的?还有一些关于为什么D选择不可变字符串的内容。似乎有很多优点。
- trivially thread safe
- more secure
- more memory efficient in most use cases.
- cheap substrings (tokenizing and slicing)
- 琐碎的线程安全
- 更安全
- 在大多数用例中,内存效率更高。
- 廉价的子串(标记化和切片)
Not to mention most new languages have immutable strings, D2.0, Java, C#, Python, etc.
更不用说大多数新语言都有不可变的字符串,D2.0、Java、C#、Python 等。
Would C++ benefit from immutable strings?
C++ 会从不可变字符串中受益吗?
Is it possible to implement an immutable string class in c++ (or c++0x) that would have all of these advantages?
是否可以在具有所有这些优点的 c++(或 c++0x)中实现一个不可变的字符串类?
update:
更新:
There are two attempts at immutable strings const_stringand fix_str. Neither have been updated in half a decade. Are they even used? Why didn't const_string ever make it into boost?
对不可变字符串const_string和fix_str有两种尝试。两者都在五年内没有更新。他们甚至被使用了吗?为什么 const_string 没有进入 boost?
采纳答案by yoco
As an opinion:
作为一个意见:
- Yes, I'd quite like an immutable string library for C++.
- No, I would not like std::string to be immutable.
- 是的,我非常喜欢 C++ 的不可变字符串库。
- 不,我不希望 std::string 是不可变的。
Is it really worth doing (as a standard library feature)? I would say not. The use of const gives you locally immutable strings, and the basic nature of systems programming languages means that you really do need mutable strings.
真的值得做(作为标准库功能)吗?我会说不是。const 的使用为您提供了本地不可变的字符串,而系统编程语言的基本性质意味着您确实需要可变字符串。
回答by yoco
I found most people in this thread do not really understand what immutable_string
is. It is not only about the constness. The really power of immutable_string
is the performance (even in single thread program) and the memory usage.
我发现这个线程中的大多数人并不真正了解什么immutable_string
是。这不仅仅是关于常数。真正强大的immutable_string
是性能(即使在单线程程序中)和内存使用。
Imagine that, if all strings are immutable, and all string are implemented like
想象一下,如果所有字符串都是不可变的,并且所有字符串都像这样实现
class string {
char* _head ;
size_t _len ;
} ;
How can we implement a sub-str operation? We don't need to copy any char. All we have to do is assign the _head
and the _len
. Then the sub-string shares the same memory segment with the source string.
我们如何实现 sub-str 操作?我们不需要复制任何字符。我们所要做的就是分配_head
和_len
。然后子字符串与源字符串共享相同的内存段。
Of course we can not really implement a immutable_string only with the two data members. The real implementation might need a reference-counted(or fly-weighted) memory block. Like this
当然,我们不能真正仅使用两个数据成员来实现 immutable_string。真正的实现可能需要一个引用计数(或飞行加权)内存块。像这样
class immutable_string {
boost::fly_weight<std::string> _s ;
char* _head ;
size_t _len ;
} ;
Both the memory and the performance would be better than the traditional string in most cases, especially when you know what you are doing.
在大多数情况下,内存和性能都会比传统字符串更好,尤其是当您知道自己在做什么时。
Of course C++ can benefit from immutable string, and it is nice to have one. I have checked the boost::const_string
and the fix_str
mentioned by Cubbi. Those should be what I am talking about.
当然,C++ 可以从不可变字符串中受益,有一个是很好的。我已经检查了Cubbi 提到的boost::const_string
和fix_str
。这些应该就是我要说的。
回答by Notinlist
My conclusion is that C++ does not require the immutable pattern because it has const semantics.
我的结论是 C++ 不需要不可变模式,因为它具有 const 语义。
In Java, if you have a Person
class and you return the String name
of the person with the getName()
method, your only protection is the immutable pattern. If it would not be there you would have to clone()
your strings all night and day (as you have to do with data members that are not typical value-objects, but still needs to be protected).
在 Java 中,如果您有一个Person
类并且您String name
使用该getName()
方法返回人员的,那么您唯一的保护就是不可变模式。如果它不存在,您将不得不clone()
整夜使用您的字符串(因为您必须处理不是典型值对象但仍需要保护的数据成员)。
In C++ you have const std::string& getName() const
. So you can write SomeFunction(person.getName())
where it is like void SomeFunction(const std::string& subject)
.
在 C++ 中,你有const std::string& getName() const
. 所以你可以写SomeFunction(person.getName())
它喜欢的地方void SomeFunction(const std::string& subject)
。
- No copy happened
- If anyone wants to copy he is free to do so
- Technique applies to all data types, not just strings
- 没有复制发生
- 如果有人想复制,他可以随意复制
- 技术适用于所有数据类型,而不仅仅是字符串
回答by peterchen
I don't think there's a definitive answer here. It's subjective—if not because personal taste then at least because of the type of code one most often deals with. (Still, a valuable question.)
我认为这里没有明确的答案。这是主观的——如果不是因为个人品味,那么至少是因为最常处理的代码类型。(仍然是一个有价值的问题。)
Immutable strings are great when memory is cheap—this wasn't true when C++ was developed, and it isn't the case on all platforms targeted by C++. (OTOH on more limited platforms C seems much more common than C++, so that argument is weak.)
当内存便宜时,不可变字符串很好——在开发 C++ 时并非如此,在 C++ 所针对的所有平台上也并非如此。(OTOH 在更有限的平台上 C 似乎比 C++ 更常见,所以这个论点很弱。)
You can create an immutable string class in C++, and you can make it largely compatible with std::string
—but you will still lose when comparing to a built-in string class with dedicated optimizations and language features.
您可以在 C++ 中创建一个不可变的字符串类,并且您可以使其在很大程度上兼容std::string
——但是与具有专用优化和语言功能的内置字符串类相比,您仍然会失败。
std::string
is the best standardstring we get, so I wouldn't like to see any messing with it. I use it very rarely, though; std::string
has too many drawbacks from my point of view.
std::string
是我们得到的最好的标准字符串,所以我不希望看到任何乱七八糟的东西。不过,我很少使用它;在我看来std::string
有太多缺点。
回答by Cubbi
You're certainly not the only person who though that. In fact, there is const_stringlibrary by Maxim Yegorushkin, which seems to have been written with inclusion into boost in mind. And here's a little newer library, fix_strby Roland Pibinger. I'm not sure how tricky would full string interning at run-time be, but most of the advantages are achievable when necessary.
你当然不是唯一一个这么想的人。事实上,Maxim Yegorushkin有一个const_string库,它似乎在编写时考虑了包含到 boost 中。下面是一个较新的小图书馆,fix_str罗兰Pibinger。我不确定在运行时完整字符串实习会有多棘手,但大多数优点在必要时都是可以实现的。
回答by Mark Ransom
const std::string
There you go. A string literal is also immutable, unless you want to get into undefined behavior.
你去吧。字符串文字也是不可变的,除非您想进入未定义的行为。
Edit:Of course that's only half the story. A const string variable isn't useful because you can't make it reference a new string. A reference to a const string would do it, except that C++ won't allow you to reassign a reference as in other languages like Python. The closest thing would be a smart pointer to a dynamically allocated string.
编辑:当然这只是故事的一半。const 字符串变量没有用,因为你不能让它引用一个新的字符串。对 const 字符串的引用可以做到这一点,除了 C++ 不允许您像在其他语言(如 Python)中那样重新分配引用。最接近的是一个指向动态分配字符串的智能指针。
回答by supercat
Immutable strings are great if, whenever it's necessary to create a new a string, the memory manager will always be able to determine determine the whereabouts of every string reference. On most platforms, language support for such ability could be provided at relatively modest cost, but on platforms without such language support built in it's much harder.
不可变字符串很好,如果需要创建一个新字符串,内存管理器将始终能够确定每个字符串引用的下落。在大多数平台上,可以以相对适中的成本提供对这种能力的语言支持,但在没有内置这种语言支持的平台上,则要困难得多。
If, for example, one wanted to design a Pascal implementation on x86 that supported immutable strings, it would be necessary for the string allocator to be able to walk the stack to find all string references; the only execution-time cost of that would be requiring a consistent function-call approach [e.g. not using tail calls, and having every non-leaf function maintain a frame pointer]. Each memory area allocated with new
would need to have a bit to indicate whether it contained any strings and those that do contain strings would need to have an index to a memory-layout descriptor, but those costs would be pretty slight.
例如,如果想要在 x86 上设计支持不可变字符串的 Pascal 实现,则字符串分配器必须能够遍历堆栈以查找所有字符串引用;唯一的执行时间成本是需要一致的函数调用方法[例如不使用尾调用,并让每个非叶函数维护一个帧指针]。分配的每个内存区域new
都需要有一个位来指示它是否包含任何字符串,而那些包含字符串的内存区域需要有一个内存布局描述符的索引,但这些成本会非常小。
If a GC wasn't table to walk the stack, then it would be necessary to have code use handles rather than pointers, and have code create string handles when local variables come into scope, and destroy the handles when they go out of scope. Much greater overhead.
如果 GC 不是用来遍历堆栈的表,那么就需要让代码使用句柄而不是指针,并让代码在局部变量进入作用域时创建字符串句柄,并在它们离开作用域时销毁句柄。更大的开销。
回答by Martin Beckett
Qt also uses immutable strings with copy-on-write.
There is some debate about how much performance it really buys you with decent compilers.
Qt 还使用写时复制的不可变字符串。
关于使用体面的编译器真正为您带来多少性能存在一些争论。
回答by fredoverflow
constant strings make little sense with value semantics, and sharing isn't one of C++'s greatest strengths...
常量字符串对值语义毫无意义,共享也不是 C++ 的最大优势之一......
回答by Pierre Carrier
Strings are mutable in Ruby.
字符串在 Ruby 中是可变的。
$ irb
>> foo="hello"
=> "hello"
>> bar=foo
=> "hello"
>> foo << "world"
=> "helloworld"
>> print bar
helloworld=> nil
- trivially thread safe
- 琐碎的线程安全
I would tend to forget safety arguments. If you want to be thread-safe, lock it, or don't touch it. C++ is not a convenient language, have your own conventions.
我往往会忘记安全论据。如果你想线程安全,锁定它,或者不要碰它。C++ 不是一种方便的语言,有自己的约定。
- more secure
- 更安全
No. As soon as you have pointer arithmetics and unprotected access to the address space, forget about being secure. Safer against innocently bad coding, yes.
不。一旦您拥有指针算术和对地址空间的无保护访问,忘记安全。对无辜的糟糕编码更安全,是的。
- more memory efficient in most use cases.
- 在大多数用例中,内存效率更高。
Unless you implement CPU-intensive mechanisms, I don't see how.
除非您实施 CPU 密集型机制,否则我不知道如何。
- cheap substrings (tokenizing and slicing)
- 廉价的子串(标记化和切片)
That would be one very good point. Could be done by referring to a string with backreferences, where modifications to a string would cause a copy. Tokenizing and slicing become free, mutations become expensive.
那将是一个非常好的观点。可以通过引用带有反向引用的字符串来完成,其中对字符串的修改会导致复制。标记化和切片变得免费,突变变得昂贵。