C# .NET Framework 中的并发 HashSet<T>?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18922985/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Concurrent HashSet<T> in .NET Framework?
提问by kukab
I have the following class.
我有以下课程。
class Test{
public HashSet<string> Data = new HashSet<string>();
}
I need to change the field "Data" from different threads, so I would like some opinions on my current thread-safe implementation.
我需要从不同的线程更改字段“数据”,所以我想对我当前的线程安全实现提出一些意见。
class Test{
public HashSet<string> Data = new HashSet<string>();
public void Add(string Val){
lock(Data) Data.Add(Val);
}
public void Remove(string Val){
lock(Data) Data.Remove(Val);
}
}
Is there a better solution, to go directly to field and protect it from concurrent access by multiple threads?
有没有更好的解决方案,直接进入字段并保护它免受多个线程的并发访问?
采纳答案by ZenLulz
Your implementation is correct. The .NET Framework does not provide a built-in concurrent hashset type, unfortunately. However, there are some workarounds.
您的实施是正确的。不幸的是,.NET Framework 不提供内置的并发散列集类型。但是,有一些解决方法。
ConcurrentDictionary (recommended)
ConcurrentDictionary(推荐)
This first one is to use the class ConcurrentDictionary<TKey, TValue>
in the namespace System.Collections.Concurrent
. In the case, the value is pointless, so we can use a simple byte
(1 byte in memory).
第一个是使用ConcurrentDictionary<TKey, TValue>
命名空间中的类System.Collections.Concurrent
。在这种情况下,该值毫无意义,因此我们可以使用简单的byte
(内存中的 1 个字节)。
private ConcurrentDictionary<string, byte> _data;
This is the recommended option because the type is thread-safe and provide you the same advantages than a HashSet<T>
except key and value are different objects.
这是推荐的选项,因为该类型是线程安全的,并且为您提供与HashSet<T>
除了键和值是不同对象之外的相同优势。
Source: Social MSDN
来源:社交 MSDN
ConcurrentBag
并发包
If you don't mind about the duplicate entries, you can use the class ConcurrentBag<T>
in the same namespace of the previous class.
如果您不介意重复条目,则可以ConcurrentBag<T>
在上一个类的相同命名空间中使用该类。
private ConcurrentBag<string> _data;
Self-implementation
自行实现
Finally, as you did, you can implement your own data type, using lock or other ways that the .NET provides you to be thread-safe. Here is a great example: How to implement ConcurrentHashSet in .Net
最后,正如您所做的那样,您可以使用锁或 .NET 为您提供的其他线程安全方式来实现您自己的数据类型。这是一个很好的例子:如何在 .Net 中实现 ConcurrentHashSet
The only drawback of this solution is that the type HashSet<T>
doesn't officially concurrent access, even for reading operations.
这种解决方案的唯一缺点是该类型HashSet<T>
没有正式的并发访问,即使对于读取操作也是如此。
I quote the code of the linked post (originally written by Ben Mosher).
我引用了链接帖子的代码(最初由Ben Mosher编写)。
using System.Collections.Generic;
using System.Threading;
namespace BlahBlah.Utilities
{
public class ConcurrentHashSet<T> : IDisposable
{
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
private readonly HashSet<T> _hashSet = new HashSet<T>();
#region Implementation of ICollection<T> ...ish
public bool Add(T item)
{
_lock.EnterWriteLock();
try
{
return _hashSet.Add(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public void Clear()
{
_lock.EnterWriteLock();
try
{
_hashSet.Clear();
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public bool Contains(T item)
{
_lock.EnterReadLock();
try
{
return _hashSet.Contains(item);
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
public bool Remove(T item)
{
_lock.EnterWriteLock();
try
{
return _hashSet.Remove(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public int Count
{
get
{
_lock.EnterReadLock();
try
{
return _hashSet.Count;
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
}
#endregion
#region Dispose
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}
protected virtual void Dispose(bool disposing)
{
if (disposing)
if (_lock != null)
_lock.Dispose();
}
~ConcurrentHashSet()
{
Dispose(false);
}
#endregion
}
}
EDIT:Move the entrance lock methods ouside the try
blocks, as they could throw an exception and execute the instructions contained in the finally
blocks.
编辑:将入口锁定方法移到try
块之外,因为它们可能引发异常并执行finally
块中包含的指令。
回答by Dbl
I prefer complete solutions so i did this: Mind you my Count is implemented in a different fashion because i don't see why one should be forbidden to read the hashset while attempting to count its values.
我更喜欢完整的解决方案,所以我这样做了:请注意,我的 Count 以不同的方式实现,因为我不明白为什么在尝试计算其值时应该禁止读取哈希集。
@Zen, Thanks for getting it started.
@Zen,感谢您开始使用它。
[DebuggerDisplay("Count = {Count}")]
[Serializable]
public class ConcurrentHashSet<T> : ICollection<T>, ISet<T>, ISerializable, IDeserializationCallback
{
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
private readonly HashSet<T> _hashSet = new HashSet<T>();
public ConcurrentHashSet()
{
}
public ConcurrentHashSet(IEqualityComparer<T> comparer)
{
_hashSet = new HashSet<T>(comparer);
}
public ConcurrentHashSet(IEnumerable<T> collection)
{
_hashSet = new HashSet<T>(collection);
}
public ConcurrentHashSet(IEnumerable<T> collection, IEqualityComparer<T> comparer)
{
_hashSet = new HashSet<T>(collection, comparer);
}
protected ConcurrentHashSet(SerializationInfo info, StreamingContext context)
{
_hashSet = new HashSet<T>();
// not sure about this one really...
var iSerializable = _hashSet as ISerializable;
iSerializable.GetObjectData(info, context);
}
#region Dispose
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}
protected virtual void Dispose(bool disposing)
{
if (disposing)
if (_lock != null)
_lock.Dispose();
}
public IEnumerator<T> GetEnumerator()
{
return _hashSet.GetEnumerator();
}
~ConcurrentHashSet()
{
Dispose(false);
}
public void OnDeserialization(object sender)
{
_hashSet.OnDeserialization(sender);
}
public void GetObjectData(SerializationInfo info, StreamingContext context)
{
_hashSet.GetObjectData(info, context);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
#endregion
public void Add(T item)
{
_lock.EnterWriteLock();
try
{
_hashSet.Add(item);
}
finally
{
if(_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public void UnionWith(IEnumerable<T> other)
{
_lock.EnterWriteLock();
_lock.EnterReadLock();
try
{
_hashSet.UnionWith(other);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
public void IntersectWith(IEnumerable<T> other)
{
_lock.EnterWriteLock();
_lock.EnterReadLock();
try
{
_hashSet.IntersectWith(other);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
public void ExceptWith(IEnumerable<T> other)
{
_lock.EnterWriteLock();
_lock.EnterReadLock();
try
{
_hashSet.ExceptWith(other);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
public void SymmetricExceptWith(IEnumerable<T> other)
{
_lock.EnterWriteLock();
try
{
_hashSet.SymmetricExceptWith(other);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public bool IsSubsetOf(IEnumerable<T> other)
{
_lock.EnterWriteLock();
try
{
return _hashSet.IsSubsetOf(other);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public bool IsSupersetOf(IEnumerable<T> other)
{
_lock.EnterWriteLock();
try
{
return _hashSet.IsSupersetOf(other);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public bool IsProperSupersetOf(IEnumerable<T> other)
{
_lock.EnterWriteLock();
try
{
return _hashSet.IsProperSupersetOf(other);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public bool IsProperSubsetOf(IEnumerable<T> other)
{
_lock.EnterWriteLock();
try
{
return _hashSet.IsProperSubsetOf(other);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public bool Overlaps(IEnumerable<T> other)
{
_lock.EnterWriteLock();
try
{
return _hashSet.Overlaps(other);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public bool SetEquals(IEnumerable<T> other)
{
_lock.EnterWriteLock();
try
{
return _hashSet.SetEquals(other);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
bool ISet<T>.Add(T item)
{
_lock.EnterWriteLock();
try
{
return _hashSet.Add(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public void Clear()
{
_lock.EnterWriteLock();
try
{
_hashSet.Clear();
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public bool Contains(T item)
{
_lock.EnterWriteLock();
try
{
return _hashSet.Contains(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public void CopyTo(T[] array, int arrayIndex)
{
_lock.EnterWriteLock();
try
{
_hashSet.CopyTo(array, arrayIndex);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public bool Remove(T item)
{
_lock.EnterWriteLock();
try
{
return _hashSet.Remove(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public int Count
{
get
{
_lock.EnterWriteLock();
try
{
return _hashSet.Count;
}
finally
{
if(_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
}
public bool IsReadOnly
{
get { return false; }
}
}
回答by pugby
The tricky part about making an ISet<T>
concurrent is that the set methods (union, intersection, difference) are iterative in nature. At the very least you have to iterate over all n members of one of the sets involved in the operation, while locking both sets.
ISet<T>
并发的棘手部分是集合方法(联合、交集、差异)本质上是迭代的。至少您必须遍历操作中涉及的集合之一的所有 n 个成员,同时锁定两个集合。
You lose the advantages of a ConcurrentDictionary<T,byte>
when you have to lock the entire set during iteration. Without locking, these operations are not thread safe.
ConcurrentDictionary<T,byte>
当您必须在迭代期间锁定整个集合时,您就失去了 a 的优势。如果没有锁定,这些操作就不是线程安全的。
Given the added overhead of ConcurrentDictionary<T,byte>
, it's probably wiser just to use the lighter weight HashSet<T>
and just surround everything in locks.
考虑到 的额外开销ConcurrentDictionary<T,byte>
,使用更轻的重量HashSet<T>
并将所有内容都放在锁中可能更明智。
If you don't need the set operations, use ConcurrentDictionary<T,byte>
and just use default(byte)
as the value when you are adding keys.
如果您不需要设置操作,请在添加键时使用ConcurrentDictionary<T,byte>
并仅用default(byte)
作值。
回答by S?ren Boisen
Since noone else mentioned it, I will offer an alternative approach that may or may not be appropriate for your particular purpose:
由于没有其他人提到它,我将提供一种替代方法,它可能适合也可能不适合您的特定目的:
Microsoft Immutable Collections
Microsoft 不可变集合
From a blog postby the MS team behind:
来自背后 MS 团队的一篇博客文章:
While creating and running concurrently is easier than ever, one of the fundamental problems still exists: mutable shared state. Reading from multiple threads is typically very easy, but once the state needs to be updated, it gets a lot harder, especially in designs that require locking.
An alternative to locking is making use of immutable state. Immutable data structures are guaranteed to never change and can thus be passed freely between different threads without worrying about stepping on somebody else's toes.
This design creates a new problem though: How do you manage changes in state without copying the entire state each time? This is especially tricky when collections are involved.
This is where immutable collections come in.
虽然并发创建和运行比以往任何时候都容易,但仍然存在一个基本问题:可变共享状态。从多个线程读取通常很容易,但是一旦需要更新状态,就会变得更加困难,尤其是在需要锁定的设计中。
锁定的替代方法是使用不可变状态。不可变数据结构保证永远不会改变,因此可以在不同线程之间自由传递,而不必担心踩到别人的脚趾。
但是,这种设计产生了一个新问题:如何在不每次都复制整个状态的情况下管理状态更改?当涉及集合时,这尤其棘手。
这就是不可变集合的用武之地。
These collections include ImmutableHashSet<T>and ImmutableList<T>.
这些集合包括ImmutableHashSet<T>和ImmutableList<T>。
Performance
表现
Since the immutable collections use tree data structures underneath to enable structural sharing, their performance characteristics are different from mutable collections. When comparing to a locking mutable collection, the results will depend on lock contention and access patterns. However, taken from another blog postabout the immutable collections:
由于不可变集合在底层使用树数据结构来实现结构共享,因此它们的性能特征与可变集合不同。与锁定可变集合相比,结果将取决于锁争用和访问模式。但是,摘自另一篇关于不可变集合的博客文章:
Q: I've heard that immutable collections are slow. Are these any different? Can I use them when performance or memory is important?
A: These immutable collections have been highly tuned to have competitive performance characteristics to the mutable collections while balancing memory sharing. In some cases they are very nearly as fast as the mutable collections both algorithmically and in actual time, sometimes even faster, while in other cases they are algorithmically more complex. In many cases however the difference will be negligible. Generally you should use the simplest code to get the job done and then tune for performance as necessary. The immutable collections help you write simple code, especially when thread-safety has to be considered.
问:我听说不可变集合很慢。这些有什么不同吗?当性能或内存很重要时,我可以使用它们吗?
答:这些不可变集合已经过高度调整,在平衡内存共享的同时,具有与可变集合竞争的性能特征。在某些情况下,它们在算法上和实际时间上几乎与可变集合一样快,有时甚至更快,而在其他情况下,它们在算法上更复杂。然而,在许多情况下,差异可以忽略不计。通常,您应该使用最简单的代码来完成工作,然后根据需要调整性能。不可变集合可帮助您编写简单的代码,尤其是在必须考虑线程安全时。
In other words, in many cases the difference won't be noticeable and you should go with the simpler choice - which for concurrent sets would be to use ImmutableHashSet<T>
, since you don't have an existing locking mutable implementation! :-)
换句话说,在许多情况下差异不会很明显,您应该选择更简单的选择 - 对于并发集将使用ImmutableHashSet<T>
,因为您没有现有的锁定可变实现!:-)
回答by i3arnon
Instead of wrapping a ConcurrentDictionary
or locking over a HashSet
I created an actual ConcurrentHashSet
based on ConcurrentDictionary
.
我没有包装 aConcurrentDictionary
或锁定 aHashSet
我创建了一个ConcurrentHashSet
基于ConcurrentDictionary
.
This implementation supports basic operations per item without HashSet
's set operations as they make less sense in concurrent scenarios IMO:
此实现支持每个项目的基本操作,而无需HashSet
's set 操作,因为它们在 IMO 并发场景中意义不大:
var concurrentHashSet = new ConcurrentHashSet<string>(
new[]
{
"hamster",
"HAMster",
"bar",
},
StringComparer.OrdinalIgnoreCase);
concurrentHashSet.TryRemove("foo");
if (concurrentHashSet.Contains("BAR"))
{
Console.WriteLine(concurrentHashSet.Count);
}
Output: 2
输出:2
You can get it from NuGet hereand see the source on GitHub here.