Java 从字典中找到单词字谜的最佳算法

Question

提问by chiru

I was given a problem something like this

我遇到了这样的问题

I have a List which is dictionary containing millions of words and I am given input a word like OSPT onlt 2 words can be formed STOP and POST.. I want to find out all anagram words matching in dictonary in optimized way.

我有一个列表，它是包含数百万个单词的字典，我输入了一个单词，例如 OSPT onlt 2 个单词可以形成 STOP 和 POST .. 我想以优化的方式找出字典中匹配的所有 anagram 单词。

What i solved.

我解决了什么。

I gave below solution.I will take the word and permute it and check the word exist in dictionary or not.But this is n*n not optimized.Is there any way to solve this

我给出了下面的解决方案。我会拿这个词并排列它并检查字典中是否存在这个词。但这是 n*n 没有优化。有什么办法可以解决这个问题

Answer 1

采纳答案by Nick Holt

You sort the characters in each word alphabetically to form key in a map whose values are the lists of words for that key.

您按字母顺序对每个单词中的字符进行排序，以形成映射中的键，其值是该键的单词列表。

When you're given a word to find the anagrams for, you sort the characters in that word alphabetically and do a lookup in the map.

当您得到一个要查找其字谜的单词时，您可以按字母顺序对该单词中的字符进行排序并在地图中进行查找。

From your example and adding the word POOL, you'd get:

从您的示例中并添加 POOL 一词，您将得到：

LOOP -> [LOOP, POOL, POLO]
OPST -> [STOP, POST]

The Java code would be something like:

Java 代码类似于：

public class AnagramGenerator
{
  private Map<String, Collection<String>> indexedDictionary;

  public AnagramGenerator(List<String> dictionary)
  {
    this.indexedDictionary = index(dictionary);
  }

  public Collection<String> getAnagrams(String word)
  {
    return indexedDictionary.get(sort(word));
  }


  private Map<String, Collection<String>> index(List<String> dictionary)
  {
    MultiMap<String, String> indexedDictionary = HashMultimap.create();

    for (String word : dictionary)
    {
      indexDictionary.put(sort(word), word);
    }

    return indexedDictionary.asMap();
  }

  private String sort(String word) 
  {
    List<Character> sortedCharacters= Arrays.asList(word.toCharArray());
    Collections.sort(sortedCharacters);

    StringBuilder builder = new StringBuilder();
    for (Character character : sortedCharacters)
    {
      builder.append(character);
    }

    return builder.toString();
  }
}

Answer 2

回答by Peter Lawrey

You can do this.

你可以这样做。

sort each word and add it to a MultiMap of sorted word to actual word.
look up each word to use as an anagram by sorting the word first.

对每个单词进行排序并将其添加到已排序单词到实际单词的 MultiMap 中。
通过首先对单词进行排序来查找要用作字谜的每个单词。

The index cost is once and O(N) where N is the number of words.

索引成本是一次和 O(N)，其中 N 是单词的数量。

After that the cost of sorting is O(M log M) to sort the letters where M is the number of letters. This is very cheap compared to the cost of calculating permutations.

之后排序的成本是 O(M log M) 来排序字母，其中 M 是字母的数量。与计算排列的成本相比，这是非常便宜的。

BTW This approach, the words are only scanned once, in advance.

BTW 这种方法，单词只扫描一次，提前。

Answer 3

回答by Abhishek Bansal

This can be done in following way:

这可以通过以下方式完成：

For the given word, keep a count of all the characters in it. For example for OSTP,

对于给定的单词，计算其中的所有字符。例如对于OSTP，

count['O'] = 1;
count['S'] = 1;
count['T'] = 1;
count['P'] = 1;

You can form an array of 26 elements like this.

您可以像这样形成一个包含 26 个元素的数组。

Then while iterating through the dictionary, just check which word has the same character count.

然后在遍历字典时，只需检查哪个单词具有相同的字符数。

Answer 4

回答by Ivan Smirnov

You may preprocess your list: replace any word from it with its sorted anagram (i.e. abacaba becomes aaaabbc). This string uniquely represents any word which is the anagram to the word from the dictionary.

您可以预处理您的列表：用排序后的字谜替换其中的任何单词（即 abacaba 变为 aaaabbc）。该字符串唯一地表示作为字典中单词的字谜的任何单词。

Then, when you receive a query, sort letters in it and check if this word is in preprocessed dictionary.

然后，当您收到查询时，将其中的字母排序并检查该单词是否在预处理字典中。

Answer 5

回答by Leeor

For best speed, you can map the characters into unique prime values, multiply them (make sure you have large enough numbers), and use the product as a numerical key for storing the valid permutations. Each number is unique for the given set of permutations as the characters form a unique prime decomposition.

为获得最佳速度，您可以将字符映射到唯一的质数，将它们相乘（确保您有足够大的数字），然后将乘积用作数字键来存储有效排列。每个数字对于给定的一组排列都是唯一的，因为字符形成了唯一的素数分解。

Given an input word, repeat the process to get the value, and access the dictionary directly with that. Similar to sorted strings solution but saves the overhead of sorting and simplifies the key comparisons.

给定一个输入词，重复该过程以获取值，然后直接访问字典。类似于排序字符串解决方案，但节省了排序的开销并简化了关键比较。

See also here for a related solution in c - Generate same unique hash code for all anagrams

另请参阅此处以获取 c 中的相关解决方案 -为所有字谜生成相同的唯一哈希码

Java 从字典中找到单词字谜的最佳算法

提问by chiru

采纳答案by Nick Holt

回答by Peter Lawrey

回答by Abhishek Bansal

回答by Ivan Smirnov

回答by Leeor

相关推荐

最近更新

标签

Java 从字典中找到单词字谜的最佳算法

提问by chiru

采纳答案by Nick Holt

回答by Peter Lawrey

回答by Abhishek Bansal

回答by Ivan Smirnov

回答by Leeor

相关推荐

Java @OneToMany 没有反向关系也没有连接表？

如何使用java for android在Canvas上画一个圆圈

在命令行的类路径中包含 jars（javac 或 apt）

Java 使用 Maven surefire 插件来包含测试

相关推荐

最近更新

标签