在 C# 中寻找后缀树实现?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/171196/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 16:34:02  来源:igfitidea点击:

Looking for the suffix tree implementation in C#?

c#data-structuressuffix-tree

提问by Goran

I've implemented a basic search for a research project. I'm trying to make the search more efficient by building a suffix tree. I'm interested in a C# implementation of the Ukkonenalgorith. I don't want to waste time rolling my own if such implementation exists.

我已经实现了一个研究项目的基本搜索。我试图通过构建后缀树来提高搜索效率。我对Ukkonen算法的 C# 实现感兴趣。如果存在这样的实现,我不想浪费时间自己动手。

采纳答案by George Mamaladze

Hei, just finished implementing .NET (c#) library containing different trie implementations. Among them:

嘿,刚刚完成了包含不同 trie 实现的 .NET (c#) 库的实现。他们之中:

  • Classical trie
  • Patricia trie
  • Suffix trie
  • A trie using Ukkonen'salgorithm
  • 经典树
  • 帕特里夏·特里
  • 后缀树
  • 使用Ukkonen算法的尝试

I tried to make source code easy readable. Usage is also very straight forward:

我试图使源代码易于阅读。用法也很简单:

using Gma.DataStructures.StringSearch;

...

var trie = new UkkonenTrie<int>(3);
//var trie = new SuffixTrie<int>(3);

trie.Add("hello", 1);
trie.Add("world", 2);
trie.Add("hell", 3);

var result = trie.Retrieve("hel");

The library is well tested and also published as TrieNetNuGet package.

该库经过充分测试,也作为TrieNetNuGet 包发布。

See github.com/gmamaladze/trienet

github.com/gmamaladze/trienet

回答by torial

Hard question. Here's the closest to match I could find: http://www.codeproject.com/KB/recipes/ahocorasick.aspx, which is an implementation of the Aho-Corasick string matching algorithm. Now, the algorithm uses a suffix-tree-like structure per: http://en.wikipedia.org/wiki/Aho-Corasick_algorithm

难以回答的问题。这是我能找到的最接近的匹配:http: //www.codeproject.com/KB/recipes/ahocorasick.aspx,它是 Aho-Corasick 字符串匹配算法的实现。现在,该算法使用后缀树状结构:http: //en.wikipedia.org/wiki/Aho-Corasick_algorithm

Now, if you want a prefix tree, this article claims to have an implementation for you: http://www.codeproject.com/KB/recipes/prefixtree.aspx

现在,如果你想要一个前缀树,这篇文章声称为你提供了一个实现:http: //www.codeproject.com/KB/recipes/prefixtree.aspx

<HUMOR> Now that I did your homework, how about you mow my lawn. (Reference: http://flyingmoose.org/tolksarc/homework.htm) </HUMOR>

< HUMOR> 现在我做了你的功课,你来修剪我的草坪怎么样。(参考:http: //flyingmoose.org/tolksarc/homework.htm)< /HUMOR>

Edit: I found a C# suffix tree implementation that was a port of a C++ one posted on a blog: http://code.google.com/p/csharsuffixtree/source/browse/#svn/trunk/suffixtree

编辑:我发现了一个 C# 后缀树实现,它是在博客上发布的 C++ 的一个端口:http: //code.google.com/p/csharsuffixtree/source/browse/#svn/trunk/suffixtree

Edit: There is a new project at Codeplex that is focused on suffix trees: http://suffixtree.codeplex.com/

编辑:Codeplex 有一个新项目专注于后缀树:http: //suffixtree.codeplex.com/

回答by cdiggins

Here is an implementation of a suffix tree that is reasonably efficient. I haven't studied Ukkonen's implementation, but the running time of this algorithm I believe is quite reasonable, approximately O(N Log N). Note the number of internal nodes in the tree created is equal to the number of letters in the parent string.

这是一个相当高效的后缀树的实现。我没有研究过 Ukkonen 的实现,但我认为这个算法的运行时间是相当合理的,大约O(N Log N). 请注意,创建的树中的内部节点数等于父字符串中的字母数。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using NUnit.Framework;

namespace FunStuff
{
    public class SuffixTree
    {
        public class Node
        {
            public int Index = -1;
            public Dictionary<char, Node> Children = new Dictionary<char, Node>();
        }

        public Node Root = new Node();
        public String Text;

        public void InsertSuffix(string s, int from)
        {             
            var cur = Root;
            for (int i = from; i < s.Length; ++i)
            {
                var c = s[i];
                if (!cur.Children.ContainsKey(c))
                {
                    var n = new Node() {Index = from};
                    cur.Children.Add(c, n);

                    // Very slow assertion. 
                    Debug.Assert(Find(s.Substring(from)).Any());

                    return;
                }
                cur = cur.Children[c];
            }
            Debug.Assert(false, "It should never be possible to arrive at this case");
            throw new Exception("Suffix tree corruption");
        }

        private static IEnumerable<Node> VisitTree(Node n)
        {
            foreach (var n1 in n.Children.Values)
                foreach (var n2 in VisitTree(n1))
                    yield return n2;
            yield return n;
        }

        public IEnumerable<int> Find(string s)
        {
            var n = FindNode(s);
            if (n == null) yield break;
            foreach (var n2 in VisitTree(n))
                yield return n2.Index;
        }

        private Node FindNode(string s)
        {
            var cur = Root;
            for (int i = 0; i < s.Length; ++i)
            {
                var c = s[i];
                if (!cur.Children.ContainsKey(c))
                {
                    // We are at a leaf-node.
                    // What we do here is check to see if the rest of the string is at this location. 
                    for (var j=i; j < s.Length; ++j)
                        if (cur.Index + j >= Text.Length || Text[cur.Index + j] != s[j])
                            return null;
                    return cur;
                }
                cur = cur.Children[c];
            }
            return cur;
        }

        public SuffixTree(string s)
        {
            Text = s;
            for (var i = s.Length - 1; i >= 0; --i)
                InsertSuffix(s, i);
            Debug.Assert(VisitTree(Root).Count() - 1 == s.Length);
        }
    }

    [TestFixture]
    public class TestSuffixTree
    {
        [Test]
        public void TestBasics()
        {
            var s = "banana";
            var t = new SuffixTree(s);
            var results = t.Find("an").ToArray();
            Assert.AreEqual(2, results.Length);
            Assert.AreEqual(1, results[0]);
            Assert.AreEqual(3, results[1]);
        }
    } 
}