java 如何使用java在文件中搜索单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14545730/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 16:37:10  来源:igfitidea点击:

How to search for a word in a file using java

javajava-me

提问by Jevison7x

I am writing a java program to search for a word in a text file containing a list of words in the dictionary. As you may now, this file contains about 300,000 words. I was able to come up with a program that can iterate through the words comparing each word with the input word (the word I am searching for). The problem is that this process takes a lot of time to find a word especially if the word starts with the last alphabets like x, y, or z. I want something more efficient that can find a word almost instantly. Here is my code:

我正在编写一个 java 程序来在包含字典中单词列表的文本文件中搜索一个单词。正如您现在可能看到的,这个文件包含大约 300,000 个单词。我能够想出一个程序,它可以遍历单词,将每个单词与输入单词(我正在搜索的单词)进行比较。问题是这个过程需要很多时间来查找一个单词,特别是如果这个单词以 x、y 或 z 等最后一个字母开头。我想要一些更有效的东西,几乎可以立即找到一个词。这是我的代码:

import java.io.IOException;
import java.io.InputStreamReader;

public class ReadFile
{
public static void main(String[] args) throws IOException
{
    ReadFile rf = new ReadFile();
    rf.searchWord(args[0]);
}

private void searchWord(String token) throws IOException
{
    InputStreamReader reader = new InputStreamReader(
            getClass().getResourceAsStream("sowpods.txt"));
    String line = null;
    // Read a single line from the file. null represents the EOF.
    while((line = readLine(reader)) != null && !line.equals(token))
    {
        System.out.println(line);
    }

    if(line != null && line.equals(token))
    {
        System.out.println(token + " WAS FOUND.");
    }
    else if(line != null && !line.equals(token))
    {
        System.out.println(token + " WAS NOT FOUND.");
    }
    else
    {
        System.out.println(token + " WAS NOT FOUND.");
    }
    reader.close();
}

private String readLine(InputStreamReader reader) throws IOException
{
    // Test whether the end of file has been reached. If so, return null.
    int readChar = reader.read();
    if(readChar == -1)
    {
        return null;
    }
    StringBuffer string = new StringBuffer("");
    // Read until end of file or new line
    while(readChar != -1 && readChar != '\n')
    {
        // Append the read character to the string. Some operating systems
        // such as Microsoft Windows prepend newline character ('\n') with
        // carriage return ('\r'). This is part of the newline character
        // and therefore an exception that should not be appended to the
        // string.
        if(readChar != '\r')
        {
            string.append((char) readChar);
        }
        // Read the next character
        readChar = reader.read();
    }
    return string.toString();
}

}

}

Please also note that I would like to use this program in a Java ME environment. Any help would be highly appreciated thanks - Jevison7x.

另请注意,我想在 Java ME 环境中使用此程序。感谢您提供任何帮助 - Jevison7x。

回答by nhahtdh

You can use fgrep(fgrepis activated by -Fto grep) (Linux man page of fgrep):

您可以使用fgrepfgrep-Fto激活grep)(fgrep 的 Linux 手册页):

grep -F -f dictionary.txt inputfile.txt

The dictionary file should contain the words one on each line.

字典文件应包含每行一个单词。

Not sure if it is still accurate, but Wikipedia article on grepmentions the use of Aho-Corasick algorithmin fgrep, which is an algorithm that builds an automata based on a fixed dictionaryfor quick string matching.

不确定它是否仍然准确,但维基百科关于 grep 的文章提到了Aho-Corasick 算法在中的使用fgrep,这是一种基于固定字典构建自动机以进行快速字符串匹配的算法。

Anyway, you can have a look at the list of string searching algorithms on a finite set of patternson Wikipedia. These are the more efficient ones to work with when searching for words in dictionary.

无论如何,您可以查看维基百科上有限模式集上的字符串搜索算法列表。在字典中搜索单词时,这些是更有效的工作。