我在哪里可以找到 Java 中基于 Trie 的标准地图实现?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/623892/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 17:04:37  来源:igfitidea点击:

Where do I find a standard Trie based map implementation in Java?

javaalgorithmoptimizationtrie

提问by Uri

I have a Java program that stores a lot of mappings from Strings to various objects.

我有一个 Java 程序,它存储了很多从字符串到各种对象的映射。

Right now, my options are either to rely on hashing (via HashMap) or on binary searches (via TreeMap). I am wondering if there is an efficient and standard trie-based map implementation in a popular and quality collections library?

现在,我的选择要么是依靠散列(通过 HashMap),要么依靠二分搜索(通过 TreeMap)。我想知道在流行和高质量的集合库中是否有一个高效和标准的基于树的地图实现?

I've written my own in the past, but I'd rather go with something standard, if available.

我过去写过自己的,但我宁愿使用标准的东西,如果有的话。

Quick clarification: While my question is general, in the current project I am dealing with a lot of data that is indexed by fully-qualified class name or method signature. Thus, there are many shared prefixes.

快速澄清:虽然我的问题很笼统,但在当前项目中,我正在处理大量由完全限定的类名或方法签名索引的数据。因此,有许多共享前缀。

采纳答案by David Schlosnagle

You might want to look at the Trie implementation that Limewire is contributingto the Google Guava.

您可能想查看Limewire 为Google Guava贡献Trie 实现

回答by andrii

What you need is org.apache.commons.collections.FastTreeMap, I think.

你需要的是org.apache.commons.collections.FastTreeMap,我想。

回答by TofuBeer

You might look at this TopCoderone as well (registration required...).

你也可以看看这个 TopCoder(需要注册......)。

回答by erickson

There is no trie data structure in the core Java libraries.

核心Java 库中没有trie 数据结构。

This may be because tries are usually designed to store character strings, while Java data structures are more general, usually holding any Object(defining equality and a hash operation), though they are sometimes limited to Comparableobjects (defining an order). There's no common abstraction for "a sequence of symbols," although CharSequenceis suitable for character strings, and I suppose you could do something with Iterablefor other types of symbols.

这可能是因为 try 通常用于存储字符串,而 Java 数据结构更通用,通常包含 any Object(定义相等性和散列操作),尽管它们有时仅限于 Comparable对象(定义顺序)。“符号序列”没有通用的抽象,虽然CharSequence适用于字符串,我想你可以Iterable对其他类型的符号做一些事情。

Here's another point to consider: when trying to implement a conventional trie in Java, you are quickly confronted with the fact that Java supports Unicode. To have any sort of space efficiency, you have to restrict the strings in your trie to some subset of symbols, or abandon the conventional approach of storing child nodes in an array indexed by symbol. This might be another reason why tries are not considered general-purpose enough for inclusion in the core library, and something to watch out for if you implement your own or use a third-party library.

这里还有一点需要考虑:在尝试用 Java 实现传统的特里树时,您很快就会遇到 Java 支持 Unicode 的事实。为了获得任何类型的空间效率,您必须将尝试中的字符串限制为某些符号子集,或者放弃将子节点存储在由符号索引的数组中的传统方法。这可能是尝试被认为不够通用以包含在核心库中的另一个原因,并且如果您实现自己的库或使用第三方库,则需要注意一些事情。

回答by RokL

If you required sorted map, then tries are worthwhile. If you don't then hashmap is better. Hashmap with string keys can be improved over the standard Java implementation: Array hash map

如果您需要排序的地图,那么尝试是值得的。如果不这样做,则 hashmap 更好。可以通过标准 Java 实现改进带有字符串键的 Hashmap: Array hash map

回答by Melinda Green

I wrote and published a simple and fast implementation here.

在这里编写并发布了一个简单快速的实现。

回答by Alex Beardsley

Also check out concurrent-trees. They support both Radix and Suffix trees and are designed for high concurrency environments.

另请查看concurrent-trees。它们支持基数树和后缀树,专为高并发环境而设计。

回答by Nate

If you're not worried about pulling in the Scala library, you can use this space efficient implementation I wrote of a burst trie.

如果你不担心拉入 Scala 库,你可以使用我写的一个burst trie 的空间高效的实现。

https://github.com/nbauernfeind/scala-burst-trie

https://github.com/nbauernfeind/scala-burst-trie

回答by coderz

here is my implementation, enjoy it via: GitHub - MyTrie.java

这是我的实现,通过以下方式享受它:GitHub - MyTrie.java

/* usage:
    MyTrie trie = new MyTrie();
    trie.insert("abcde");
    trie.insert("abc");
    trie.insert("sadas");
    trie.insert("abc");
    trie.insert("wqwqd");
    System.out.println(trie.contains("abc"));
    System.out.println(trie.contains("abcd"));
    System.out.println(trie.contains("abcdefg"));
    System.out.println(trie.contains("ab"));
    System.out.println(trie.getWordCount("abc"));
    System.out.println(trie.getAllDistinctWords());
*/

import java.util.*;

public class MyTrie {
  private class Node {
    public int[] next = new int[26];
    public int wordCount;
    public Node() {
      for(int i=0;i<26;i++) {
        next[i] = NULL;
      }
      wordCount = 0;
    }
  }

  private int curr;
  private Node[] nodes;
  private List<String> allDistinctWords;
  public final static int NULL = -1;

  public MyTrie() {
    nodes = new Node[100000];
    nodes[0] = new Node();
    curr = 1;
  }

  private int getIndex(char c) {
    return (int)(c - 'a');
  }

  private void depthSearchWord(int x, String currWord) {
    for(int i=0;i<26;i++) {
      int p = nodes[x].next[i];
      if(p != NULL) {
        String word = currWord + (char)(i + 'a');
        if(nodes[p].wordCount > 0) {
          allDistinctWords.add(word);
        }
        depthSearchWord(p, word);
      }
    }
  }

  public List<String> getAllDistinctWords() {
    allDistinctWords = new ArrayList<String>();
    depthSearchWord(0, "");
    return allDistinctWords;
  }

  public int getWordCount(String str) {
    int len = str.length();
    int p = 0;
    for(int i=0;i<len;i++) {
      int j = getIndex(str.charAt(i));
      if(nodes[p].next[j] == NULL) {
        return 0;
      }
      p = nodes[p].next[j];
    }
    return nodes[p].wordCount;
  }

  public boolean contains(String str) {
    int len = str.length();
    int p = 0;
    for(int i=0;i<len;i++) {
      int j = getIndex(str.charAt(i));
      if(nodes[p].next[j] == NULL) {
        return false;
      }
      p = nodes[p].next[j];
    }
    return nodes[p].wordCount > 0;
  }

  public void insert(String str) {
    int len = str.length();
    int p = 0;
    for(int i=0;i<len;i++) {
      int j = getIndex(str.charAt(i));
      if(nodes[p].next[j] == NULL) {
        nodes[curr] = new Node();
        nodes[p].next[j] = curr;
        curr++;
      }
      p = nodes[p].next[j];
    }
    nodes[p].wordCount++;
  }
}

回答by Duncan Jones

Apache Commons Collectionsv4.0 now supports trie structures.

Apache Commons Collectionsv4.0 现在支持特里结构。

See the org.apache.commons.collections4.triepackage infofor more information. In particular, check the PatriciaTrieclass:

有关更多信息,请参阅org.apache.commons.collections4.trie信息。特别是,检查PatriciaTrie类:

Implementation of a PATRICIA Trie (Practical Algorithm to Retrieve Information Coded in Alphanumeric).

A PATRICIA Trie is a compressed Trie. Instead of storing all data at the edges of the Trie (and having empty internal nodes), PATRICIA stores data in every node. This allows for very efficient traversal, insert, delete, predecessor, successor, prefix, range, and select(Object) operations. All operations are performed at worst in O(K) time, where K is the number of bits in the largest item in the tree. In practice, operations actually take O(A(K)) time, where A(K) is the average number of bits of all items in the tree.

PATRICIA Trie(检索以字母数字编码的信息的实用算法)的实现。

PATRICIA Trie 是压缩的 Trie。PATRICIA 不是将所有数据存储在 Trie 的边缘(并且具有空的内部节点),而是将数据存储在每个节点中。这允许非常有效的遍历、插入、删除、前驱、后继、前缀、范围和选择(对象)操作。所有操作都在 O(K) 时间内执行,其中 K 是树中最大项的位数。实际上,操作实际上需要 O(A(K)) 时间,其中 A(K) 是树中所有项目的平均位数。