java 在文本文件中查找唯一的单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15522321/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find the unique words in a text file
提问by pratnala
I'm writing this program in Java to find the unique words in a text file. I want to know if this code is correct as it shows even spaces as words.
我正在用 Java 编写这个程序来查找文本文件中的唯一单词。我想知道这段代码是否正确,因为它甚至将空格显示为单词。
String[] words;
List<String> uniqueWords = new ArrayList<String>();
words = str1.split("[!-~]* ");
for (int i = 0; i < words.length; i++)
{
if (!(uniqueWords.contains (words[i])))
{
uniqueWords.add(words[i]);
}
}
For example, if my input is "Hello world! How is the world?" my output array/set/list should have hello, world, how, is, the
例如,如果我的输入是“Hello world!世界好吗?” 我的输出数组/集/列表应该有你好,世界,如何,是,
回答by Mariusz Jamro
回答by assylias
Slightly modified version of other answers (I like it short and simple):
其他答案的略微修改版本(我喜欢简短而简单):
String[] words = str1.split("[!-~]* ");
Set<String> uniqueWords = new HashSet<String>();
for (String word : words) {
uniqueWords.add(word);
}
Note: if you want to split on !
or -
or ~
or space, you should use this:
注意:如果你想在!
或-
或~
或空间上分割,你应该使用这个:
String[] words = str1.split("[-!~\s]+");
(the dash must be first or last)
(破折号必须是第一个或最后一个)
回答by OldCurmudgeon
If we're really going for compact:
如果我们真的要紧凑:
Set<String> unique = new HashSet<String>(Arrays.asList(str.toLowerCase().split("[-.,:;?!~\s]+")));
回答by PSR
Set does not allow duplicates where as List allows duplicates.
Set 不允许重复,而 List 允许重复。
String[] words;
Set<String> uniqueWords = new HashSet<String>();
words = str1.split("[!-~]* ");
for (int i = 0; i < words.length; i++)
uniqueWords.add(words[i]); //Here you need not to check with set because it wont allow duplicates
回答by Deepak Bala
I'd suggest you use pattern and matchers and drop the result in a Set.
我建议您使用模式和匹配器并将结果放入集合中。
public void getWords()
{
Set<String> words = new HashSet<String>();
String pattern = "[a-zA-Z]+\s";
String match = "hello world how* are. you! world hello";
Pattern compile = Pattern.compile(pattern);
Matcher matcher = compile.matcher(match);
while(matcher.find())
{
String group = matcher.group();
boolean add = words.add(group);
if(add)
{
System.out.println(group);
}
}
}
Output:
输出:
hello
world
Change your definition of what a 'word' is by changing the pattern.
通过改变模式来改变你对“词”的定义。
回答by Sam Abdul
In case if you would like to get the words that have not been duplicated in the sentence/any sort of text, You can try this:
如果您想获取句子/任何类型的文本中没有重复的单词,您可以尝试以下操作:
public static Map<String,Integer> getUniqueWords(String sentence){
String[] word = sentence.split("[\W]+");
Map<String,Integer> uniqueWord = new HashMap<>();
for (String e:word){
if(!uniqueWord.containsKey(e)){
uniqueWord.put(e,1);
}else{
uniqueWord.remove(e);
}
}
return uniqueWord;
}