创建 Java 程序以在文件中搜索特定单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4338450/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating a Java Program to Search a File for a Specific Word
提问by Dom Minic
I am just learning that language and was wondering what a more experience Java programmer would do in the following situation?
我只是在学习那门语言,想知道在以下情况下,Java 程序员会怎么做?
I would like to create a java program that will search a specified file for all instanced for a specific word.
我想创建一个 java 程序,它将搜索指定文件的所有实例化特定单词。
How would you go about this, does that Java API come with a class that provides file scanning capabilities or would i have to write my own class to do this?
您将如何处理这个问题,Java API 是否带有提供文件扫描功能的类,或者我是否必须编写自己的类来执行此操作?
Thanks for any input,
Dom.
感谢您的任何投入,
Dom。
回答by Reese Moore
The java API does offer the java.util.Scanner
class which will allow you to scan across an input file.
Java API 确实提供了java.util.Scanner
允许您扫描输入文件的类。
Depending on how you intend to use this, however, this might not be the best idea. Is the file very large? Are you searching only one file or are you trying to keep a database of many files and search for files within that? In that case, you might want to use a more fleshed out engine such as lucene.
但是,根据您打算如何使用它,这可能不是最好的主意。文件很大吗?您是只搜索一个文件还是试图保留一个包含许多文件的数据库并在其中搜索文件?在这种情况下,您可能希望使用更加充实的引擎,例如lucene。
回答by Peter Lawrey
Unless the file is very large, I would
除非文件非常大,否则我会
String text = IOUtils.toString(new FileReader(filename));
boolean foundWord = text.matches("\b" + word+ "\b");
To find all the text between your word you can use split() and use the length of the strings to determine the position.
要查找单词之间的所有文本,您可以使用 split() 并使用字符串的长度来确定位置。
回答by aioobe
As others have pointed out, you could use the Scanner
class.
正如其他人指出的那样,您可以使用该Scanner
课程。
I put your question in a file, data.txt
, and ran the following program:
我把你的问题放在一个文件中data.txt
,然后运行以下程序:
import java.io.*;
import java.util.Scanner;
import java.util.regex.MatchResult;
public class Test {
public static void main(String[] args) throws FileNotFoundException {
Scanner s = new Scanner(new File("data.txt"));
while (null != s.findWithinHorizon("(?i)\bjava\b", 0)) {
MatchResult mr = s.match();
System.out.printf("Word found: %s at index %d to %d.%n", mr.group(),
mr.start(), mr.end());
}
s.close();
}
}
The output is:
输出是:
Word found: Java at index 74 to 78.
Word found: java at index 153 to 157.
Word found: Java at index 279 to 283.
The pattern searched for, (?i)\bjava\b
, means the following:
搜索的模式(?i)\bjava\b
, 表示以下内容:
(?i)
turn on the case-insensitive switch\b
means a word boundryjava
is the string searched for\b
a word boundry again.
(?i)
打开不区分大小写的开关\b
意思是一个词的边界java
是搜索的字符串\b
又是一个字界。
If the search term comes from the user, or if it for some other reason may contain special characters, I suggest you use \Q
and \E
around the string, as it quotes all characters in between, (and if you're really picky, make sure the input doesn't contain \E
itself).
如果搜索词来自用户,或者由于其他原因可能包含特殊字符,我建议您在字符串周围使用\Q
和\E
,因为它引用了中间的所有字符,(如果您真的很挑剔,请确保输入不包含\E
自身)。