java 在文本文件中查找唯一的单词

Question

提问by pratnala

I'm writing this program in Java to find the unique words in a text file. I want to know if this code is correct as it shows even spaces as words.

我正在用 Java 编写这个程序来查找文本文件中的唯一单词。我想知道这段代码是否正确，因为它甚至将空格显示为单词。

String[] words;
List<String> uniqueWords = new ArrayList<String>();
words = str1.split("[!-~]* ");
for (int i = 0; i < words.length; i++)
{
    if (!(uniqueWords.contains (words[i])))
    {
        uniqueWords.add(words[i]);
    }
}

For example, if my input is "Hello world! How is the world?" my output array/set/list should have hello, world, how, is, the

例如，如果我的输入是“Hello world！世界好吗？” 我的输出数组/集/列表应该有你好，世界，如何，是，

Answer 1

回答by Mariusz Jamro

You can find unique words by using a Set. Set is a Collection which contains no duplicate elements.

您可以使用Set查找独特的单词。Set 是一个不包含重复元素的集合。

String[] words;
Set<String> uniqueWords = new HashSet<String>();
words = str1.split("[\W]+");
for (int i = 0; i < words.length; i++)
{
    uniqueWords.add(words[i]);
}

Answer 2

回答by assylias

Slightly modified version of other answers (I like it short and simple):

其他答案的略微修改版本（我喜欢简短而简单）：

String[] words = str1.split("[!-~]* ");
Set<String> uniqueWords = new HashSet<String>();

for (String word : words) {
    uniqueWords.add(word);
}

Note: if you want to split on !or -or ~or space, you should use this:

注意：如果你想在!或-或~或空间上分割，你应该使用这个：

String[] words = str1.split("[-!~\s]+");

(the dash must be first or last)

（破折号必须是第一个或最后一个）

Answer 3

回答by OldCurmudgeon

If we're really going for compact:

如果我们真的要紧凑：

Set<String> unique = new HashSet<String>(Arrays.asList(str.toLowerCase().split("[-.,:;?!~\s]+")));

Answer 4

回答by PSR

Set does not allow duplicates where as List allows duplicates.

Set 不允许重复，而 List 允许重复。

String[] words;
Set<String> uniqueWords = new HashSet<String>();
words = str1.split("[!-~]* ");
for (int i = 0; i < words.length; i++)
    uniqueWords.add(words[i]); //Here you need not to check with set because it wont allow duplicates

Answer 5

回答by Deepak Bala

I'd suggest you use pattern and matchers and drop the result in a Set.

我建议您使用模式和匹配器并将结果放入集合中。

public void getWords()
{
    Set<String> words = new HashSet<String>();
    String pattern = "[a-zA-Z]+\s";
    String match = "hello world how* are. you! world hello";
    Pattern compile = Pattern.compile(pattern);
    Matcher matcher = compile.matcher(match);
    while(matcher.find())
    {
        String group = matcher.group();
        boolean add = words.add(group);
        if(add)
        {
            System.out.println(group);
        }
    }
}

Output:

输出：

hello 
world

Change your definition of what a 'word' is by changing the pattern.

通过改变模式来改变你对“词”的定义。

Answer 6

回答by Sam Abdul

In case if you would like to get the words that have not been duplicated in the sentence/any sort of text, You can try this:

如果您想获取句子/任何类型的文本中没有重复的单词，您可以尝试以下操作：

   public static Map<String,Integer> getUniqueWords(String sentence){
   String[] word = sentence.split("[\W]+");
        Map<String,Integer> uniqueWord = new HashMap<>();
        for (String e:word){
            if(!uniqueWord.containsKey(e)){
                uniqueWord.put(e,1);
            }else{
                uniqueWord.remove(e);
            }
        }
        return uniqueWord;
    }

java 在文本文件中查找唯一的单词

提问by pratnala

回答by Mariusz Jamro

回答by assylias

回答by OldCurmudgeon

回答by PSR

回答by Deepak Bala

回答by Sam Abdul

相关推荐

最近更新

标签

java 在文本文件中查找唯一的单词

提问by pratnala

回答by Mariusz Jamro

回答by assylias

回答by OldCurmudgeon

回答by PSR

回答by Deepak Bala

回答by Sam Abdul

相关推荐

java Spring 从 JBoss 上下文加载 PropertySourcesPlaceholderConfigurer

java JPA 注释 - 如何从与当前对象不同的表中检索单个值？

java 使用 DateFormat.parse() 无法解析的日期

java Javah 未找到错误

相关推荐

最近更新

标签