C# 为什么 .NET 字符串是不可变的?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2365272/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why .NET String is immutable?
提问by Nirajan Singh
As we all know, Stringis immutable. What are the reasons for String being immutable and the introduction of StringBuilderclass as mutable?
众所周知,String是不可变的。String 是不可变的以及StringBuilder类被引入为可变的原因是什么?
采纳答案by Jon Hanna
- Instances of immutable types are inherently thread-safe, since no thread can modify it, the risk of a thread modifying it in a way that interferes with another is removed (the reference itself is a different matter).
- Similarly, the fact that aliasing can't produce changes (if x and y both refer to the same object a change to x entails a change to y) allows for considerable compiler optimisations.
- Memory-saving optimisations are also possible. Interning and atomising being the most obvious examples, though we can do other versions of the same principle. I once produced a memory saving of about half a GB by comparing immutable objects and replacing references to duplicates so that they all pointed to the same instance (time-consuming, but a minute's extra start-up to save a massive amount of memory was a performance win in the case in question). With mutable objects that can't be done.
- No side-effects can come from passing an immutable type as a method to a parameter unless it is
out
orref
(since that changes the reference, not the object). A programmer therefore knows that ifstring x = "abc"
at the start of a method, and that doesn't change in the body of the method, thenx == "abc"
at the end of the method. - Conceptually, the semantics are more like value types; in particular equality is based on state rather than identity. This means that
"abc" == "ab" + "c"
. While this doesn't require immutability, the fact that a reference to such a string will always equal "abc" throughout its lifetime (which does require immutability) makes uses as keys where maintaining equality to previous values is vital, much easier to ensure correctness of (strings are indeed commonly used as keys). - Conceptually, it can make more sense to be immutable. If we add a month onto Christmas, we haven't changed Christmas, we have produced a new date in late January. It makes sense therefore that
Christmas.AddMonths(1)
produces a newDateTime
rather than changing a mutable one. (Another example, if I as a mutable object change my name, what has changed is which name I am using, "Jon" remains immutable and other Jons will be unaffected. - Copying is fast and simple, to create a clone just
return this
. Since the copy can't be changed anyway, pretending something is its own copy is safe. - [Edit, I'd forgotten this one]. Internal state can be safely shared between objects. For example, if you were implementing list which was backed by an array, a start index and a count, then the most expensive part of creating a sub-range would be copying the objects. However, if it was immutable then the sub-range object could reference the same array, with only the start index and count having to change, with a veryconsiderable change to construction time.
- 不可变类型的实例本质上是线程安全的,因为没有线程可以修改它,消除了一个线程以干扰另一个的方式修改它的风险(引用本身是另一回事)。
- 类似地,别名不能产生变化(如果 x 和 y 都指代同一个对象,则对 x 的更改需要对 y 的更改)这一事实允许进行相当大的编译器优化。
- 节省内存的优化也是可能的。实习和雾化是最明显的例子,尽管我们可以做相同原理的其他版本。我曾经通过比较不可变对象并替换对重复项的引用,使它们都指向同一个实例,从而节省了大约半 GB 的内存(耗时,但为了节省大量内存而额外启动一分钟是在有问题的情况下性能获胜)。使用无法完成的可变对象。
- 将不可变类型作为方法传递给参数不会产生副作用,除非它是
out
或ref
(因为它改变了引用,而不是对象)。因此,程序员知道,如果string x = "abc"
在方法的开始,并且在方法的主体中没有改变,那么x == "abc"
在方法的结尾。 - 从概念上讲,语义更像是值类型;尤其是平等基于国家而非身份。这意味着
"abc" == "ab" + "c"
. 虽然这不需要不变性,但对此类字符串的引用在其整个生命周期中始终等于“abc”(这确实需要不变性)这一事实使得在保持与先前值相等的情况下用作键至关重要,更容易确保正确性of(字符串确实通常用作键)。 - 从概念上讲,不可变更有意义。如果我们在圣诞节增加一个月,我们并没有改变圣诞节,我们在 1 月下旬产生了一个新的日期。因此,
Christmas.AddMonths(1)
产生一个新的DateTime
而不是改变一个可变的是有道理的。(另一个例子,如果我作为一个可变对象改变我的名字,改变的是我使用的名字,“Jon”保持不变,其他 Jons 将不受影响。 - 复制既快速又简单,只需
return this
. 由于无论如何都无法更改副本,因此假装某些东西是它自己的副本是安全的。 - [编辑,我忘记了这个]。内部状态可以在对象之间安全地共享。例如,如果您正在实现由数组、起始索引和计数支持的列表,那么创建子范围最昂贵的部分就是复制对象。但是,如果它是不可变的,那么子范围对象可以引用同一个数组,只需要更改起始索引和计数,并且构造时间会发生相当大的变化。
In all, for objects which don't have undergoing change as part of their purpose, there can be many advantages in being immutable. The main disadvantage is in requiring extra constructions, though even here it's often overstated (remember, you have to do several appends before StringBuilder becomes more efficient than the equivalent series of concatenations, with their inherent construction).
总而言之,对于没有将变化作为其目的的一部分的对象,不可变可能有许多优点。主要的缺点是需要额外的构造,尽管即使在这里它也经常被夸大(请记住,在 StringBuilder 变得比具有其固有构造的等效串联系列更有效之前,您必须执行几次附加操作)。
It would be a disadvantage if mutability was part of the purpose of an object (who'd want to be modeled by an Employee object whose salary could never ever change) though sometimes even then it can be useful (in a many web and other stateless applications, code doing read operations is separate from that doing updates, and using different objects may be natural - I wouldn't make an object immutable and then force that pattern, but if I already had that pattern I might make my "read" objects immutable for the performance and correctness-guarantee gain).
如果可变性是对象目的的一部分(谁希望由工资永远不会改变的 Employee 对象建模),这将是一个缺点,尽管有时它可能很有用(在许多网络和其他无状态应用程序,执行读取操作的代码与执行更新操作是分开的,并且使用不同的对象可能是很自然的 - 我不会使对象不可变然后强制使用该模式,但是如果我已经拥有该模式,我可能会创建我的“读取”对象对于性能和正确性保证增益是不可变的)。
Copy-on-write is a middle ground. Here the "real" class holds a reference to a "state" class. State classes are shared on copy operations, but if you change the state, a new copy of the state class is created. This is more often used with C++ than C#, which is why it's std:string enjoys some, but not all, of the advantages of immutable types, while remaining mutable.
写时复制是一个中间立场。这里的“真实”类持有对“状态”类的引用。状态类在复制操作时共享,但如果您更改状态,则会创建状态类的新副本。这比 C# 更常用于 C++,这就是为什么它的 std:string 享有不可变类型的一些(但不是全部)优点,同时保持可变。
回答by kolosy
string management is an expensive process. keeping strings immutable allows repeated strings to be reused, rather than re-created.
字符串管理是一个昂贵的过程。保持字符串不可变允许重复使用重复的字符串,而不是重新创建。
回答by SQLMenace
Strings and other concrete objects are typically expressed as immutable objects to improve readability and runtime efficiency. Security is another, a process can't change your string and inject code into the string
字符串和其他具体对象通常表示为不可变对象,以提高可读性和运行时效率。安全性是另一个,进程不能改变你的字符串并将代码注入到字符串中
回答by dsimcha
You never have to defensively copy immutable data. Despite the fact that you need to copy it to mutate it, often the ability to freely alias and never have to worry about unintended consequences of this aliasing can lead to better performance because of the lack of defensive copying.
您永远不必防御性地复制不可变数据。尽管事实上你需要复制它来改变它,但由于缺乏防御性复制,通常自由别名的能力和永远不必担心这种别名的意外后果可以带来更好的性能。
回答by Ken Liu
Immutable Strings also prevent concurrency-related issues.
不可变字符串还可以防止与并发相关的问题。
回答by Reed Copsey
Making strings immutable has many advantages. It provides automatic thread safety, and makes strings behave like an intrinsic type in a simple, effective manner. It also allows for extra efficiencies at runtime (such as allowing effective string interning to reduce resource usage), and has huge security advantages, since it's impossible for an third party API call to change your strings.
使字符串不可变有很多优点。它提供了自动线程安全性,并使字符串以简单、有效的方式表现得像内在类型。它还允许在运行时提高效率(例如允许有效的字符串实习以减少资源使用),并且具有巨大的安全优势,因为第三方 API 调用不可能更改您的字符串。
StringBuilder was added in order to address the one major disadvantage of immutable strings - runtime construction of immutable types causes a lot of GC pressure and is inherently slow. By making an explicit, mutable class to handle this, this issue is addressed without adding unneeded complication to the string class.
添加 StringBuilder 是为了解决不可变字符串的一个主要缺点 - 不可变类型的运行时构造会导致很大的 GC 压力并且本质上很慢。通过创建一个明确的、可变的类来处理这个问题,这个问题得到了解决,而不会向字符串类添加不必要的复杂性。
回答by NebuSoft
Why are string types immutable in C#
String is a reference type, so it is never copied, but passed by reference. Compare this to the C++ std::string object (which is not immutable), which is passed by value. This means that if you want to use a String as a key in a Hashtable, you're fine in C++, because C++ will copy the string to store the key in the hashtable (actually std::hash_map, but still) for later comparison. So even if you later modify the std::string instance, you're fine. But in .Net, when you use a String in a Hashtable, it will store a reference to that instance. Now assume for a moment that strings aren't immutable, and see what happens: 1. Somebody inserts a value x with key "hello" into a Hashtable. 2. The Hashtable computes the hash value for the String, and places a reference to the string and the value x in the appropriate bucket. 3. The user modifies the String instance to be "bye". 4. Now somebody wants the value in the hashtable associated with "hello". It ends up looking in the correct bucket, but when comparing the strings it says "bye"!="hello", so no value is returned. 5. Maybe somebody wants the value "bye"? "bye" probably has a different hash, so the hashtable would look in a different bucket. No "bye" keys in that bucket, so our entry still isn't found.
Making strings immutable means that step 3 is impossible. If somebody modifies the string he's creating a new string object, leaving the old one alone. Which means the key in the hashtable is still "hello", and thus still correct.
So, probably among other things, immutable strings are a way to enable strings that are passed by reference to be used as keys in a hashtable or similar dictionary object.
String 是一种引用类型,因此它永远不会被复制,而是通过引用传递。将此与按值传递的 C++ std::string 对象(它不是不可变的)进行比较。这意味着如果你想使用一个字符串作为哈希表中的键,你在 C++ 中没问题,因为 C++ 将复制字符串以将键存储在哈希表中(实际上是 std::hash_map,但仍然)以供以后比较. 因此,即使您稍后修改 std::string 实例,也没有问题。但是在 .Net 中,当您在 Hashtable 中使用 String 时,它将存储对该实例的引用。现在假设字符串不是一成不变的,看看会发生什么: 1. 有人将键为“hello”的值 x 插入到 Hashtable 中。2. Hashtable 计算 String 的哈希值,并将对字符串的引用和值 x 放入适当的存储桶中。3. 用户将 String 实例修改为“bye”。4. 现在有人想要哈希表中与“hello”相关的值。它最终会在正确的存储桶中查找,但是在比较字符串时,它会显示“再见”!=“hello”,因此没有返回任何值。5. 也许有人想要“再见”这个值?“bye”可能具有不同的哈希值,因此哈希表将在不同的存储桶中查找。该存储桶中没有“再见”键,因此仍未找到我们的条目。但是当比较字符串时它说“再见”!=“你好”,所以没有返回值。5. 也许有人想要“再见”这个值?“bye”可能具有不同的哈希值,因此哈希表将在不同的存储桶中查找。该存储桶中没有“再见”键,因此仍未找到我们的条目。但是当比较字符串时它说“再见”!=“你好”,所以没有返回值。5. 也许有人想要“再见”这个值?“bye”可能具有不同的哈希值,因此哈希表将在不同的存储桶中查找。该存储桶中没有“再见”键,因此仍未找到我们的条目。
使字符串不可变意味着第 3 步是不可能的。如果有人修改了字符串,他就会创建一个新的字符串对象,而将旧的对象留在原处。这意味着哈希表中的键仍然是“hello”,因此仍然正确。
因此,可能除其他外,不可变字符串是一种使通过引用传递的字符串能够用作哈希表或类似字典对象中的键的方法。
回答by AndiDog
Imagine you pass a mutable string to a function but don't expect it to be changed. Then what if the function changes that string? In C++, for instance, you could simply do call-by-value (difference between std::string
and std::string&
parameter), but in C# it's all about references so if you passed mutable strings around every function could change it and trigger unexpected side effects.
想象一下,您将一个可变字符串传递给一个函数,但不希望它被更改。那么如果函数改变了那个字符串呢?例如,在 C++ 中,您可以简单地执行按值调用(std::string
和std::string&
参数之间的差异),但在 C# 中,这完全是关于引用,因此如果您在每个函数周围传递可变字符串,则可能会更改它并触发意外的副作用。
This is just one of various reasons. Performance is another one (interned strings, for example).
这只是各种原因之一。性能是另一个(例如,实习字符串)。
回答by Nick Craver
Just to throw this in, an often forgotten view is of security, picture this scenario if strings were mutable:
简单来说,一个经常被遗忘的观点是安全性,想象一下如果字符串可变的情况:
string dir = "C:\SomePlainFolder";
//Kick off another thread
GetDirectoryContents(dir);
void GetDirectoryContents(string directory)
{
if(HasAccess(directory) {
//Here the other thread changed the string to "C:\AllYourPasswords\"
return Contents(directory);
}
return null;
}
You see how it could be very, very bad if you were allowed to mutate strings once they were passed.
您会看到,如果字符串一旦通过就被允许对其进行变异,那将会非常非常糟糕。
回答by Kevin McKelvin
Strings are passed as reference types in .NET.
字符串在 .NET 中作为引用类型传递。
Reference types place a pointer on the stack, to the actual instance that resides on the managed heap. This is different to Value types, who hold their entire instance on the stack.
引用类型在堆栈上放置一个指针,指向驻留在托管堆上的实际实例。这与值类型不同,值类型将整个实例保存在堆栈中。
When a value type is passed as a parameter, the runtime creates a copy of the value on the stack and passes that value into a method. This is why integers must be passed with a 'ref' keyword to return an updated value.
当值类型作为参数传递时,运行时会在堆栈上创建该值的副本并将该值传递给一个方法。这就是为什么必须使用 'ref' 关键字传递整数以返回更新值的原因。
When a reference type is passed, the runtime creates a copy of the pointer on the stack. That copied pointer still points to the original instance of the reference type.
传递引用类型时,运行时会在堆栈上创建指针的副本。复制的指针仍然指向引用类型的原始实例。
The string type has an overloaded = operator which creates a copy of itself, instead of a copy of the pointer - making it behave more like a value type. However, if only the pointer was copied, a second string operation could accidently overwrite the value of a private member of another class causing some pretty nasty results.
字符串类型有一个重载的 = 运算符,它创建自身的副本,而不是指针的副本 - 使其行为更像值类型。然而,如果只复制指针,第二个字符串操作可能会意外覆盖另一个类的私有成员的值,从而导致一些非常糟糕的结果。
As other posts have mentioned, the StringBuilder class allows for the creation of strings without the GC overhead.
正如其他帖子所提到的, StringBuilder 类允许在没有 GC 开销的情况下创建字符串。
回答by Eton B.
Imagine being an OS working with a string that some other thread was modifying behind your back. How could you validate anything without making a copy?
想象一下,作为一个操作系统,它正在处理某个其他线程在你背后修改的字符串。你怎么能在不复制的情况下验证任何东西?