C# 用于快速插入/删除的 .net 集合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/576176/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 08:55:01  来源:igfitidea点击:

.net collection for fast insert/delete

c#.netcollections

提问by spender

I need to maintain a roster of connected clients that are very shortlived and frequently go up and down. Due to the potential number of clients I need a collection that supports fast insert/delete. Suggestions?

我需要维护一个连接客户名册,这些客户的生命周期很短,而且经常上下波动。由于潜在的客户端数量,我需要一个支持快速插入/删除的集合。建议?

采纳答案by Henrik

C5 Generic Collection Library

C5 通用集合库

The best implementations I have found in C# and C++ are these -- for C#/CLI:

我在 C# 和 C++ 中找到的最好的实现是这些——对于 C#/CLI:

It's well researched, has extensible unit tests, and since February they also have implemented the common interfaces in .Net which makes it a lot easier to work with the collections. They were featured on Channel9and they've done extensive performance testing on the collections.

它经过充分研究,具有可扩展的单元测试,并且自 2 月以来,他们还在 .Net 中实现了通用接口,这使得使用集合变得更加容易。它们在Channel9上有特色,并且已经对这些集合进行了广泛的性能测试。

If you are using data-structures anyway these researchers have a red-black-treeimplementation in their library, similar to what you find if you fire up Lütz reflector and have a look in System.Data's internal structures :p. Insert-complexity: O(log(n)).

如果您无论如何都在使用数据结构,那么这些研究人员在他们的库中有一个红黑树实现,类似于您启动 Lütz 反射器并查看 System.Data 的内部结构时所发现的:p。插入复杂度:O(log(n))。

Lock-free C++ collections

无锁 C++ 集合

Then, if you can allow for some C++ interopand you absolutely need the speed and want as little overhead as possible, then these lock-free ADTs from Dmitriy V'jukov are probably the best you can get in this world, outperforming Intel's concurrent library of ADTs.

然后,如果您可以允许一些 C++ 互操作,并且您绝对需要速度并希望尽可能少的开销,那么来自 Dmitriy V'jukov 的这些无锁 ADT 可能是您在这个世界上可以获得的最好的,其性能优于英特尔的并发库的 ADT。

I've read some of the code and it's really the makings of someone well versed in how these things are put together. VC++ can do native C++ interop without annoying boundaries. http://www.swig.org/can otherwise help you wrap C++ interfaces for consumption in .Net, or you can do it yourself through P/Invoke.

我已经阅读了一些代码,它确实是精通这些东西如何组合在一起的人的气质。VC++ 可以在没有烦人的边界的情况下进行本地 C++ 互操作。http://www.swig.org/可以帮助您包装 C++ 接口以在 .Net 中使用,或者您可以通过 P/Invoke 自己完成。

Microsoft's Take

微软采取

They have written tutorials, this one implementing a rather unpolished skip-list in C#, and discussing other types of data-structures. (There's a better SkipList at CodeProject, which is very polished and implement the interfaces in a well-behaved manner.) They also have a few data-structures bundled with .Net, namely the HashTable/Dictionary<,>and HashSet. Of course there's the "ResizeArray"/List type as well together with a stack and queue, but they are all "linear" on search.

他们编写了教程,这个教程在 C# 中实现了一个相当粗糙的跳过列表,并讨论了其他类型的数据结构。(在 CodeProject有一个更好的SkipList,它非常优美并且以良好的方式实现了接口。)它们还有一些与 .Net 捆绑在一起的数据结构,即HashTable/Dictionary<,>HashSet。当然还有“ResizeArray”/List 类型以及堆栈和队列,但它们在搜索中都是“线性的”。

Google's perf-tools

谷歌的性能工具

If you wish to speed up the time it takes for memory-allocation you can use google's perf-tools. They are available at google code and they contain a very interesting multi-threaded malloc-implementation (TCMalloc)which shows much more consistent timing than the normal malloc does. You could use this together with the lock-free structures above to really go crazy with performance.

如果您希望加快内存分配所需的时间,您可以使用 google 的 perf-tools。它们可在 google 代码中找到,它们包含一个非常有趣的多线程 malloc 实现 (TCMalloc),它显示出比普通 malloc 更一致的时序。您可以将它与上面的无锁结构一起使用,以真正为性能而疯狂。

Improving response times with memoization

通过记忆提高响应时间

You can also use memoization on functions to improve performance through caching, something interesting if you're using e.g. F#. F# also allows C++ interop, so you're OK there.

您还可以对函数使用记忆功能,通过缓存来提高性能,如果您使用的是F# 等,这会很有趣。F# 还允许 C++ 互操作,所以你没问题。

O(k)

好的)

There's also the possibility of doing something on your own using the research which has been done on bloom-filters, which allow O(k) lookup complexity where k is a constant that depends on the number of hash-functions you have implemented. This is how google's BigTable has been implemented. These filter will get you the element if it's in the set or possibly with a very low likeliness an element which is not the one you're looking for (see the graph at wikipedia -- it's approaching P(wrong_key) -> 0.01 as size is around 10000 elements, but you can go around this by implementing further hash-functions/reducing the set.

也有可能使用在bloom-filters上完成的研究自己做一些事情,这允许 O(k) 查找复杂度,其中 k 是一个常数,取决于您已实现的散列函数的数量。这就是谷歌的 BigTable 是如何实现的。如果元素在集合中,或者可能具有非常低的可能性,这些过滤器将为您获取不是您要查找的元素的元素(请参阅维基百科上的图表 - 它接近 P(wrong_key) -> 0.01 作为大小大约有 10000 个元素,但您可以通过实现进一步的哈希函数/减少集合来解决这个问题。

I haven't searched for .Net implementations of this, but since the hashing calculations are independent you can use MS's performance team's implementation of Tasks to speed that up.

我还没有搜索过这个的 .Net 实现,但是由于散列计算是独立的,您可以使用MS 的性能团队的 Tasks 实现来加快速度。

"My" take -- randomize to reach average O(log n)

“我的”采取——随机化以达到平均 O(log n)

As it happens I just did a coursework involving data-structures. In this case we used C++, but it's very easy to translate to C#. We built three different data-structures; a bloom-filter, a skip-list and random binary search tree.

碰巧我刚刚做了一个涉及数据结构的课程。在这种情况下,我们使用了 C++,但很容易转换为 C#。我们构建了三种不同的数据结构;布隆过滤器、跳过列表和随机二叉搜索树

See the code and analysis after the last paragraph.

见最后一段后的代码和分析。

Hardware-based "collections"

基于硬件的“集合”

Finally, to make my answer "complete", if you truly need speed you can use something like Routing-tablesor Content-addressable memory. This allows you to very quickly O(1) in principle get a "hash"-to-value lookup of your data.

最后,为了使我的答案“完整”,如果您确实需要速度,您可以使用Routing-tablesContent-addressable memory 之类的东西。这使您原则上可以非常快速地进行 O(1) 的“散列”到值的数据查找。

Random Binary Search Tree/Bloom Filter C++ code

随机二叉搜索树/布隆过滤器 C++ 代码

I would really appreciate feedback if you find mistakes in the code, or just pointers on how I can do it better (or with better usage of templates). Note that the bloom filter isn't like it would be in real life; normally you don't have to be able to delete from it and then it much much more space efficient than the hack I did to allow the deleteto be tested.

如果您在代码中发现错误,或者只是指出我如何做得更好(或更好地使用模板),我将非常感谢您的反馈。请注意,布隆过滤器与现实生活中的不同;通常,您不必能够从中删除,然后它比我为测试删除所做的 hack 更节省空间。

DataStructure.h

数据结构.h

#ifndef DATASTRUCTURE_H_
#define DATASTRUCTURE_H_

class DataStructure
{
public:
    DataStructure() {countAdd=0; countDelete=0;countFind=0;}
    virtual ~DataStructure() {}

    void resetCountAdd() {countAdd=0;}
    void resetCountFind() {countFind=0;}
    void resetCountDelete() {countDelete=0;}

    unsigned int getCountAdd(){return countAdd;}
    unsigned int getCountDelete(){return countDelete;}
    unsigned int getCountFind(){return countFind;}

protected:
    unsigned int countAdd;
    unsigned int countDelete;
    unsigned int countFind;
};

#endif /*DATASTRUCTURE_H_*/

Key.h

密钥.h

#ifndef KEY_H_
#define KEY_H_

#include <string>
using namespace std;

const int keyLength = 128;

class Key : public string
{
public:
    Key():string(keyLength, ' ') {}
    Key(const char in[]): string(in){}
    Key(const string& in): string(in){}

    bool operator<(const string& other);
    bool operator>(const string& other);
    bool operator==(const string& other);

    virtual ~Key() {}
};

#endif /*KEY_H_*/

Key.cpp

密钥文件

#include "Key.h"

bool Key::operator<(const string& other)
{
    return compare(other) < 0;
};

bool Key::operator>(const string& other)
{
    return compare(other) > 0;
};

bool Key::operator==(const string& other)
{
    return compare(other) == 0;
}

BloomFilter.h

布隆过滤器.h

#ifndef BLOOMFILTER_H_
#define BLOOMFILTER_H_

#include <iostream>
#include <assert.h>
#include <vector>
#include <math.h>
#include "Key.h"
#include "DataStructure.h"

#define LONG_BIT 32
#define bitmask(val) (unsigned long)(1 << (LONG_BIT - (val % LONG_BIT) - 1))

// TODO: Implement RW-locking on the reads/writes to the bitmap.

class BloomFilter : public DataStructure
{
public:
    BloomFilter(){}
    BloomFilter(unsigned long length){init(length);}
    virtual ~BloomFilter(){}

    void init(unsigned long length);
    void dump();

    void add(const Key& key);
    void del(const Key& key);

    /**
     * Returns true if the key IS BELIEVED to exist, false if it absolutely doesn't.
     */
    bool testExist(const Key& key, bool v = false);

private:
    unsigned long hash1(const Key& key);
    unsigned long hash2(const Key& key);
    bool exist(const Key& key);
    void getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1, int& i2, const Key& key);
    void getCountIndicies(const int i1, const unsigned long h1,
        const int i2, const unsigned long h2, int& i1_c, int& i2_c);

    vector<unsigned long> m_tickBook;
    vector<unsigned int> m_useCounts;
    unsigned long m_length; // number of bits in the bloom filter
    unsigned long m_pockets; //the number of pockets

    static const unsigned long m_pocketSize; //bits in each pocket
};

#endif /*BLOOMFILTER_H_*/

BloomFilter.cpp

布隆过滤器.cpp

#include "BloomFilter.h"

const unsigned long BloomFilter::m_pocketSize = LONG_BIT;

void BloomFilter::init(unsigned long length)
{
    //m_length = length;
    m_length = (unsigned long)((2.0*length)/log(2))+1;
    m_pockets = (unsigned long)(ceil(double(m_length)/m_pocketSize));
    m_tickBook.resize(m_pockets);

    // my own (allocate nr bits possible to store in the other vector)
    m_useCounts.resize(m_pockets * m_pocketSize);

    unsigned long i; for(i=0; i< m_pockets; i++) m_tickBook[i] = 0;
    for (i = 0; i < m_useCounts.size(); i++) m_useCounts[i] = 0; // my own
}

unsigned long BloomFilter::hash1(const Key& key)
{
    unsigned long hash = 5381;
    unsigned int i=0; for (i=0; i< key.length(); i++){
        hash = ((hash << 5) + hash) + key.c_str()[i]; /* hash * 33 + c */
    }

    double d_hash = (double) hash;

    d_hash *= (0.5*(sqrt(5)-1));
    d_hash -= floor(d_hash);
    d_hash *= (double)m_length;

    return (unsigned long)floor(d_hash);
}

unsigned long BloomFilter::hash2(const Key& key)
{
    unsigned long hash = 0;
    unsigned int i=0; for (i=0; i< key.length(); i++){
        hash = key.c_str()[i] + (hash << 6) + (hash << 16) - hash;
    }
    double d_hash = (double) hash;

    d_hash *= (0.5*(sqrt(5)-1));
    d_hash -= floor(d_hash);
    d_hash *= (double)m_length;

    return (unsigned long)floor(d_hash);
}

bool BloomFilter::testExist(const Key& key, bool v){
    if(exist(key)) {
        if(v) cout<<"Key "<< key<<" is in the set"<<endl;
        return true;
    }else {
        if(v) cout<<"Key "<< key<<" is not in the set"<<endl;
        return false;
    }
}

void BloomFilter::dump()
{
    cout<<m_pockets<<" Pockets: ";

    // I changed u to %p because I wanted it printed in hex.
    unsigned long i; for(i=0; i< m_pockets; i++) printf("%p ", (void*)m_tickBook[i]);
    cout<<endl;
}

void BloomFilter::add(const Key& key)
{
    unsigned long h1, h2;
    int i1, i2;
    int i1_c, i2_c;

    // tested!

    getHashAndIndicies(h1, h2, i1, i2, key);
    getCountIndicies(i1, h1, i2, h2, i1_c, i2_c);

    m_tickBook[i1] = m_tickBook[i1] | bitmask(h1);
    m_tickBook[i2] = m_tickBook[i2] | bitmask(h2);

    m_useCounts[i1_c] = m_useCounts[i1_c] + 1;
    m_useCounts[i2_c] = m_useCounts[i2_c] + 1;

    countAdd++;
}

void BloomFilter::del(const Key& key)
{
    unsigned long h1, h2;
    int i1, i2;
    int i1_c, i2_c;

    if (!exist(key)) throw "You can't delete keys which are not in the bloom filter!";

    // First we need the indicies into m_tickBook and the
    // hashes.
    getHashAndIndicies(h1, h2, i1, i2, key);

    // The index of the counter is the index into the bitvector
    // times the number of bits per vector item plus the offset into
    // that same vector item.
    getCountIndicies(i1, h1, i2, h2, i1_c, i2_c);

    // We need to update the value in the bitvector in order to
    // delete the key.
    m_useCounts[i1_c] = (m_useCounts[i1_c] == 1 ? 0 : m_useCounts[i1_c] - 1);
    m_useCounts[i2_c] = (m_useCounts[i2_c] == 1 ? 0 : m_useCounts[i2_c] - 1);

    // Now, if we depleted the count for a specific bit, then set it to
    // zero, by anding the complete unsigned long with the notted bitmask
    // of the hash value
    if (m_useCounts[i1_c] == 0)
        m_tickBook[i1] = m_tickBook[i1] & ~(bitmask(h1));
    if (m_useCounts[i2_c] == 0)
        m_tickBook[i2] = m_tickBook[i2] & ~(bitmask(h2));

    countDelete++;
}

bool BloomFilter::exist(const Key& key)
{
    unsigned long h1, h2;
    int i1, i2;

    countFind++;

    getHashAndIndicies(h1, h2, i1, i2, key);

    return  ((m_tickBook[i1] & bitmask(h1)) > 0) &&
            ((m_tickBook[i2] & bitmask(h2)) > 0);
}

/*
 * Gets the values of the indicies for two hashes and places them in
 * the passed parameters. The index is into m_tickBook.
 */
void BloomFilter::getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1,
    int& i2, const Key& key)
{
    h1 = hash1(key);
    h2 = hash2(key);
    i1 = (int) h1/m_pocketSize;
    i2 = (int) h2/m_pocketSize;
}

/*
 * Gets the values of the indicies into the count vector, which keeps
 * track of how many times a specific bit-position has been used.
 */
void BloomFilter::getCountIndicies(const int i1, const unsigned long h1,
    const int i2, const unsigned long h2, int& i1_c, int& i2_c)
{
    i1_c = i1*m_pocketSize + h1%m_pocketSize;
    i2_c = i2*m_pocketSize + h2%m_pocketSize;
}

** RBST.h **

** RBST.h **

#ifndef RBST_H_
#define RBST_H_

#include <iostream>
#include <assert.h>
#include <vector>
#include <math.h>
#include "Key.h"
#include "DataStructure.h"

#define BUG(str) printf("%s:%d FAILED SIZE INVARIANT: %s\n", __FILE__, __LINE__, str);

using namespace std;

class RBSTNode;
class RBSTNode: public Key
{
public:
    RBSTNode(const Key& key):Key(key)
    {
        m_left =NULL;
        m_right = NULL;
        m_size = 1U; // the size of one node is 1.
    }
    virtual ~RBSTNode(){}

    string setKey(const Key& key){return Key(key);}

    RBSTNode* left(){return m_left; }
    RBSTNode* right(){return m_right;}

    RBSTNode* setLeft(RBSTNode* left) { m_left = left; return this; }
    RBSTNode* setRight(RBSTNode* right) { m_right =right; return this; }

#ifdef DEBUG
    ostream& print(ostream& out)
    {
        out << "Key(" << *this << ", m_size: " << m_size << ")";
        return out;
    }
#endif

    unsigned int size() { return m_size; }

    void setSize(unsigned int val)
    {
#ifdef DEBUG
        this->print(cout);
        cout << "::setSize(" << val << ") called." << endl;
#endif

        if (val == 0) throw "Cannot set the size below 1, then just delete this node.";
        m_size = val;
    }

    void incSize() {
#ifdef DEBUG
        this->print(cout);
        cout << "::incSize() called" << endl;
#endif

        m_size++;
    }

    void decrSize()
    {
#ifdef DEBUG
        this->print(cout);
        cout << "::decrSize() called" << endl;
#endif

        if (m_size == 1) throw "Cannot decrement size below 1, then just delete this node.";
        m_size--;
    }

#ifdef DEBUG
    unsigned int size(RBSTNode* x);
#endif

private:
    RBSTNode(){}
    RBSTNode* m_left;
    RBSTNode* m_right;
    unsigned int m_size;
};

class RBST : public DataStructure
{
public:
    RBST() {
        m_size = 0;
        m_head = NULL;
        srand(time(0));
    };

    virtual ~RBST() {};

    /**
     * Tries to add key into the tree and will return
     *      true  for a new item added
     *      false if the key already is in the tree.
     *
     * Will also have the side-effect of printing to the console if v=true.
     */
    bool add(const Key& key, bool v=false);

    /**
     * Same semantics as other add function, but takes a string,
     * but diff name, because that'll cause an ambiguity because of inheritance.
     */
    bool addString(const string& key);

    /**
     * Deletes a key from the tree if that key is in the tree.
     * Will return
     *      true  for success and
     *      false for failure.
     *
     * Will also have the side-effect of printing to the console if v=true.
     */
    bool del(const Key& key, bool v=false);

    /**
     * Tries to find the key in the tree and will return
     *      true if the key is in the tree and
     *      false if the key is not.
     *
     * Will also have the side-effect of printing to the console if v=true.
     */
    bool find(const Key& key, bool v = false);

    unsigned int count() { return m_size; }

#ifdef DEBUG
    int dump(char sep = ' ');
    int dump(RBSTNode* target, char sep);
    unsigned int size(RBSTNode* x);
#endif

private:
    RBSTNode* randomAdd(RBSTNode* target, const Key& key);
    RBSTNode* addRoot(RBSTNode* target, const Key& key);
    RBSTNode* rightRotate(RBSTNode* target);
    RBSTNode* leftRotate(RBSTNode* target);

    RBSTNode* del(RBSTNode* target, const Key& key);
    RBSTNode* join(RBSTNode* left, RBSTNode* right);

    RBSTNode* find(RBSTNode* target, const Key& key);

    RBSTNode* m_head;
    unsigned int m_size;
};

#endif /*RBST_H_*/

** RBST.cpp **

** RBST.cpp **

#include "RBST.h"

bool RBST::add(const Key& key, bool v){
    unsigned int oldSize = m_size;
    m_head = randomAdd(m_head, key);
    if (m_size > oldSize){
        if(v) cout<<"Node "<<key<< " is added into the tree."<<endl;
        return true;
    }else {
        if(v) cout<<"Node "<<key<< " is already in the tree."<<endl;
        return false;
    }
    if(v) cout<<endl;
};

bool RBST::addString(const string& key) {
    return add(Key(key), false);
}

bool RBST::del(const Key& key, bool v){
    unsigned oldSize= m_size;
    m_head = del(m_head, key);
    if (m_size < oldSize) {
        if(v) cout<<"Node "<<key<< " is deleted from the tree."<<endl;
        return true;
    }
    else {
        if(v) cout<< "Node "<<key<< " is not in the tree."<<endl;
        return false;
    }
};

bool RBST::find(const Key& key, bool v){
    RBSTNode* ret = find(m_head, key);
    if (ret == NULL){
        if(v) cout<< "Node "<<key<< " is not in the tree."<<endl;
        return false;
    }else {
        if(v) cout<<"Node "<<key<< " is in the tree."<<endl;
        return true;
    }
};

#ifdef DEBUG
int RBST::dump(char sep){
    int ret = dump(m_head, sep);
    cout<<"SIZE: " <<ret<<endl;
    return ret;
};

int RBST::dump(RBSTNode* target, char sep){
    if (target == NULL) return 0;
    int ret = dump(target->left(), sep);
    cout<< *target<<sep;
    ret ++;
    ret += dump(target->right(), sep);
    return ret;
};
#endif

/**
 * Rotates the tree around target, so that target's left
 * is the new root of the tree/subtree and updates the subtree sizes.
 *
 *(target)  b               (l) a
 *         / \      right      / \
 *        a   ?     ---->     ?   b
 *       / \                     / \
 *      ?   x                   x   ?
 *
 */
RBSTNode* RBST::rightRotate(RBSTNode* target) // private
{
    if (target == NULL) throw "Invariant failure, target is null"; // Note: may be removed once tested.
    if (target->left() == NULL) throw "You cannot rotate right around a target whose left node is NULL!";

#ifdef DEBUG
    cout    <<"Right-rotating b-node ";
    target->print(cout);
    cout    << " for a-node ";
    target->left()->print(cout);
    cout    << "." << endl;
#endif

    RBSTNode* l = target->left();
    int as0 = l->size();

    // re-order the sizes
    l->setSize( l->size() + (target->right() == NULL ? 0 : target->right()->size()) + 1); // a.size += b.right.size + 1; where b.right may be null.
    target->setSize( target->size() -as0 + (l->right() == NULL ? 0 : l->right()->size()) ); // b.size += -a_0_size + x.size where x may be null.

    // swap b's left (for a)
    target->setLeft(l->right());

    // and a's right (for b's left)
    l->setRight(target);

#ifdef DEBUG
    cout    << "A-node size: " << l->size() << ", b-node size: " << target->size() << "." << endl;
#endif

    // return the new root, a.
    return l;
};

/**
 * Like rightRotate, but the other way. See docs for rightRotate(RBSTNode*)
 */
RBSTNode* RBST::leftRotate(RBSTNode* target)
{
    if (target == NULL) throw "Invariant failure, target is null";
    if (target->right() == NULL) throw "You cannot rotate left around a target whose right node is NULL!";

#ifdef DEBUG
    cout    <<"Left-rotating a-node ";
    target->print(cout);
    cout    << " for b-node ";
    target->right()->print(cout);
    cout    << "." << endl;
#endif

    RBSTNode* r = target->right();
    int bs0 = r->size();

    // re-roder the sizes
    r->setSize(r->size() + (target->left() == NULL ? 0 : target->left()->size()) + 1);
    target->setSize(target->size() -bs0 + (r->left() == NULL ? 0 : r->left()->size()));

    // swap a's right (for b's left)
    target->setRight(r->left());

    // swap b's left (for a)
    r->setLeft(target);

#ifdef DEBUG
    cout    << "Left-rotation done: a-node size: " << target->size() << ", b-node size: " << r->size() << "." << endl;
#endif

    return r;
};

//
/**
 * Adds a key to the tree and returns the new root of the tree.
 * If the key already exists doesn't add anything.
 * Increments m_size if the key didn't already exist and hence was added.
 *
 * This function is not called from public methods, it's a helper function.
 */
RBSTNode* RBST::addRoot(RBSTNode* target, const Key& key)
{
    countAdd++;

    if (target == NULL) return new RBSTNode(key);

#ifdef DEBUG
    cout << "addRoot(";
    cout.flush();
    target->print(cout) << "," << key << ") called." << endl;
#endif

    if (*target < key)
    {
        target->setRight( addRoot(target->right(), key) );
        target->incSize(); // Should I?
        RBSTNode* res = leftRotate(target);
#ifdef DEBUG
        if (target->size() != size(target))
            BUG("in addRoot 1");
#endif
        return res;
    }

    target->setLeft( addRoot(target->left(), key) );
    target->incSize(); // Should I?
    RBSTNode* res = rightRotate(target);
#ifdef DEBUG
    if (target->size() != size(target))
        BUG("in addRoot 2");
#endif
    return res;
};

/**
 * This function is called from the public add(key) function,
 * and returns the new root node.
 */
RBSTNode* RBST::randomAdd(RBSTNode* target, const Key& key)
{
    countAdd++;

    if (target == NULL)
    {
        m_size++;
        return new RBSTNode(key);
    }

#ifdef DEBUG
    cout << "randomAdd(";
    target->print(cout) << ", \"" << key << "\") called." << endl;
#endif

    int r = (rand() % target->size()) + 1;

    // here is where we add the target as root!
    if (r == 1)
    {
        m_size++;   // TODO: Need to lock.
        return addRoot(target, key);
    }

#ifdef DEBUG
    printf("randomAdd recursion part, ");
#endif

    // otherwise, continue recursing!
    if (*target <= key)
    {
#ifdef DEBUG
    printf("target <= key\n");
#endif
        target->setRight( randomAdd(target->right(), key) );
        target->incSize(); // TODO: Need to lock.
#ifdef DEBUG
        if (target->right()->size() != size(target->right()))
            BUG("in randomAdd 1");
#endif
    }
    else
    {
#ifdef DEBUG
    printf("target > key\n");
#endif
        target->setLeft( randomAdd(target->left(), key) );
        target->incSize(); // TODO: Need to lock.
#ifdef DEBUG
        if (target->left()->size() != size(target->left()))
            BUG("in randomAdd 2");
#endif
    }

#ifdef DEBUG
    printf("randomAdd return part\n");
#endif

    m_size++;       // TODO: Need to lock.
    return target;
};

/////////////////////////////////////////////////////////////
/////////////////////  DEL FUNCTIONS ////////////////////////
/////////////////////////////////////////////////////////////

/**
 * Deletes a node with the passed key.
 * Returns the root node.
 * Decrements m_size if something was deleted.
 */
RBSTNode* RBST::del(RBSTNode* target, const Key& key)
{
    countDelete++;

    if (target == NULL) return NULL;

#ifdef DEBUG
    cout << "del(";
    target->print(cout) << ", \"" << key << "\") called." << endl;
#endif

    RBSTNode* ret = NULL;

    // found the node to delete
    if (*target == key)
    {
        ret = join(target->left(), target->right());

        m_size--;
        delete target;

        return ret; // return the newly built joined subtree!
    }

    // store a temporary size before recursive deletion.
    unsigned int size = m_size;

    if (*target < key)  target->setRight( del(target->right(), key) );
    else                target->setLeft( del(target->left(), key) );

    // if the previous recursion changed the size, we need to decrement the size of this target too.
    if (m_size < size) target->decrSize();

#ifdef DEBUG
    if (RBST::size(target) != target->size())
        BUG("in del");
#endif

    return target;
};

/**
 * Joins the two subtrees represented by left and right
 * by randomly choosing which to make the root, weighted on the
 * size of the sub-tree.
 */
RBSTNode* RBST::join(RBSTNode* left, RBSTNode* right)
{
    if (left == NULL) return right;
    if (right == NULL) return left;

#ifdef DEBUG
    cout << "join(";
    left->print(cout);
    cout << ",";
    right->print(cout) << ") called." << endl;
#endif

    // Find the chance that we use the left tree, based on its size over the total tree size.
    // 3 s.d. randomness :-p e.g. 60.3% chance.
    bool useLeft = ((rand()%1000) < (signed)((float)left->size()/(float)(left->size() + right->size()) * 1000.0));

    RBSTNode* subtree = NULL;

    if (useLeft)
    {
        subtree = join(left->right(), right);

        left->setRight(subtree)
            ->setSize((left->left() == NULL ? 0 : left->left()->size())
                        + subtree->size() + 1 );

#ifdef DEBUG
        if (size(left) != left->size())
            BUG("in join 1");
#endif

        return left;
    }

    subtree = join(right->left(), left);

    right->setLeft(subtree)
         ->setSize((right->right() == NULL ? 0 : right->right()->size())
                    + subtree->size() + 1);

#ifdef DEBUG
    if (size(right) != right->size())
        BUG("in join 2");
#endif

    return right;
};

/////////////////////////////////////////////////////////////
/////////////////////  FIND FUNCTIONS ///////////////////////
/////////////////////////////////////////////////////////////

/**
 * Tries to find the key in the tree starting
 * search from target.
 *
 * Returns NULL if it was not found.
 */
RBSTNode* RBST::find(RBSTNode* target, const Key& key)
{
    countFind++; // Could use private method only counting the first call.
    if (target == NULL) return NULL; // not found.
    if (*target == key) return target; // found (does string override ==?)
    if (*target < key) return find(target->right(), key); // search for gt to the right.
    return find(target->left(), key); // search for lt to the left.
};

#ifdef DEBUG

unsigned int RBST::size(RBSTNode* x)
{
    if (x == NULL) return 0;
    return 1 + size(x->left()) + size(x->right());
}

#endif

I'll save the SkipList for another time since it's already possible to find good implementations of a SkipList from the links and my version wasn't much different.

我将再次保存 SkipList,因为已经可以从链接中找到 SkipList 的良好实现,而且我的版本没有太大不同。

The graphs generated from the test-file are as follows:

从测试文件生成的图形如下:

Graph showing time taken to add new items for BloomFilter, RBST and SkipList.graph http://haf.se/content/dl/addtimer.png

图表显示为 BloomFilter、RBST 和 SkipList 添加新项目所花费的时间。图 http://haf.se/content/dl/addtimer.png

Graph showing time taken to find items for BloomFilter, RBST and SkipListgraph http://haf.se/content/dl/findtimer.png

图表显示为 BloomFilter、RBST 和 SkipList图表查找项目所花费的时间http://haf.se/content/dl/findtimer.png

Graph showing time taken to delete items for BloomFilter, RBST and SkipListgraph http://haf.se/content/dl/deltimer.png

图表显示删除 BloomFilter、RBST 和 SkipList 项目所花费的时间图表 http://haf.se/content/dl/deltimer.png

So as you can see, the random binary search tree was rather a lot better than the SkipList. The bloom filter lives up to its O(k).

正如你所看到的,随机二叉搜索树比 SkipList 好很多。布隆过滤器符合其 O(k)。

回答by Zach Scrivena

Consider the hash-based collections for this, e.g. HashSet, Dictionary, HashTable, which provide constant time performance for adding and removing elements.

为此考虑基于散列的集合,例如HashSetDictionaryHashTable,它们为添加和删除元素提供恒定的时间性能。

More information from the .NET Framework Developer's Guide:

.NET Framework 开发人员指南中的更多信息:

回答by baretta

You could use a Hashtable or strongly typed Dictionary<Client>. The client class might override GetHashCode to provide a faster hash code generation, or if using Hashtable you can optionally use an IHashCodeProvider.

您可以使用哈希表或强类型的 Dictionary<Client>。客户端类可能会覆盖 GetHashCode 以提供更快的哈希代码生成,或者如果使用 Hashtable,您可以选择使用 IHashCodeProvider。

回答by Chris S

How do you need to find the clients? Is a Tuple/Dictionary necessary? You're more than likely to find something that solves your problem in the Jeffrey Richter's Power Collectionslibrary which has lists, trees, most data structures you can think of.

您需要如何找到客户?是否需要元组/字典?您很可能会在 Jeffrey Richter 的Power Collections库中找到可以解决您问题的东西,该库包含列表、树和您能想到的大多数数据结构。

回答by Marc Gravell

Well, how much do you need to query it? A linked-list has fast insert/delete (at any position), but isn't as quick to search as (for example) a dictionary / sorted-list. Alternatively, a straight list with a bit/value pair in each - i.e. "still has value". Just re-use logicallyempty cells before appending. Delete just clears the cell.

那么,你需要多少钱来查询呢?链接列表具有快速插入/删除(在任何位置),但搜索速度不如(例如)字典/排序列表。或者,一个直接列表,每个列表中都有一个位/值对 - 即“仍然有价值”。只需在追加之前重新使用逻辑上为空的单元格。删除只是清除单元格。

For reference types, "null" would do here. For value-types, Nullable<T>.

对于引用类型,“null”可以在这里完成。对于值类型,Nullable<T>.

回答by Mike Bonnell

I was very impressed by the Channel9 interview with Peter Sestoft:

Channel9 对 Peter Setoft 的采访给我留下了深刻的印象:

channel9.msdn.com/shows/Going+Deep/Peter-Sestoft-C5-Generic-Collection-Library-for-C-and-CLI/

channel9.msdn.com/shows/Going+Deep/Peter-Sestoft-C5-Generic-Collection-Library-for-C-and-CLI/

He is a professor at the Copenhagen IT University who helped to create the The C5 Generic Collection Library:

他是哥本哈根 IT 大学的教授,他帮助创建了 C5 通用收藏库:

www.itu.dk/research/c5/

www.itu.dk/research/c5/

It might be overkill or it might be just the speedy collection you were looking for ...

这可能是矫枉过正,也可能只是您正在寻找的快速收藏...

hth,

嗯,

-Mike

-麦克风