创建 Java 程序以在文件中搜索特定单词

Question

提问by Dom Minic

I am just learning that language and was wondering what a more experience Java programmer would do in the following situation?

我只是在学习那门语言，想知道在以下情况下，Java 程序员会怎么做？

I would like to create a java program that will search a specified file for all instanced for a specific word.

我想创建一个 java 程序，它将搜索指定文件的所有实例化特定单词。

How would you go about this, does that Java API come with a class that provides file scanning capabilities or would i have to write my own class to do this?

您将如何处理这个问题，Java API 是否带有提供文件扫描功能的类，或者我是否必须编写自己的类来执行此操作？

Thanks for any input,
Dom.

感谢您的任何投入，
Dom。

Answer 1

回答by Reese Moore

The java API does offer the java.util.Scannerclass which will allow you to scan across an input file.

Java API 确实提供了java.util.Scanner允许您扫描输入文件的类。

Depending on how you intend to use this, however, this might not be the best idea. Is the file very large? Are you searching only one file or are you trying to keep a database of many files and search for files within that? In that case, you might want to use a more fleshed out engine such as lucene.

但是，根据您打算如何使用它，这可能不是最好的主意。文件很大吗？您是只搜索一个文件还是试图保留一个包含许多文件的数据库并在其中搜索文件？在这种情况下，您可能希望使用更加充实的引擎，例如lucene。

Answer 2

回答by Peter Lawrey

Unless the file is very large, I would

除非文件非常大，否则我会

String text = IOUtils.toString(new FileReader(filename));
boolean foundWord = text.matches("\b" + word+ "\b");

To find all the text between your word you can use split() and use the length of the strings to determine the position.

要查找单词之间的所有文本，您可以使用 split() 并使用字符串的长度来确定位置。

Answer 3

回答by aioobe

As others have pointed out, you could use the Scannerclass.

正如其他人指出的那样，您可以使用该Scanner课程。

I put your question in a file, data.txt, and ran the following program:

我把你的问题放在一个文件中data.txt，然后运行以下程序：

import java.io.*;
import java.util.Scanner;
import java.util.regex.MatchResult;

public class Test {
    public static void main(String[] args) throws FileNotFoundException {
        Scanner s = new Scanner(new File("data.txt"));
        while (null != s.findWithinHorizon("(?i)\bjava\b", 0)) {
            MatchResult mr = s.match();
            System.out.printf("Word found: %s at index %d to %d.%n", mr.group(),
                    mr.start(), mr.end());
        }
        s.close();
    }
}

The output is:

输出是：

Word found: Java at index 74 to 78.
Word found: java at index 153 to 157.
Word found: Java at index 279 to 283.

The pattern searched for, (?i)\bjava\b, means the following:

搜索的模式(?i)\bjava\b, 表示以下内容：

(?i)turn on the case-insensitive switch
\bmeans a word boundry
javais the string searched for
\ba word boundry again.

(?i)打开不区分大小写的开关
\b意思是一个词的边界
java是搜索的字符串
\b又是一个字界。

If the search term comes from the user, or if it for some other reason may contain special characters, I suggest you use \Qand \Earound the string, as it quotes all characters in between, (and if you're really picky, make sure the input doesn't contain \Eitself).

如果搜索词来自用户，或者由于其他原因可能包含特殊字符，我建议您在字符串周围使用\Q和\E，因为它引用了中间的所有字符，（如果您真的很挑剔，请确保输入不包含\E自身）。

创建 Java 程序以在文件中搜索特定单词

提问by Dom Minic

回答by Reese Moore

回答by Peter Lawrey

回答by aioobe

相关推荐

最近更新

标签

创建 Java 程序以在文件中搜索特定单词

提问by Dom Minic

回答by Reese Moore

回答by Peter Lawrey

回答by aioobe

相关推荐

Java 使用 Hibernate 映射数组

使用 Java api 的 Elasticsearch 聚合

Java 如何从数据库中检索图像并放置在 JSP 上？

Java从字符串的开头和结尾删除所有非字母数字字符

相关推荐

最近更新

标签