Java 在 .txt 文件中查找所有字符串“the”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3697833/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 03:36:56  来源:igfitidea点击:

Find all string "the" in .txt file

java

提问by Giffary

Here is my code:

这是我的代码:

// Import io so we can use file objects
import java.io.*;

public class SearchThe {
    public static void main(String args[]) {
        try {
            String stringSearch = "the";
            // Open the file c:\test.txt as a buffered reader
            BufferedReader bf = new BufferedReader(new FileReader("test.txt"));

            // Start a line count and declare a string to hold our current line.
            int linecount = 0;
                String line;

            // Let the user know what we are searching for
            System.out.println("Searching for " + stringSearch + " in file...");

            // Loop through each line, stashing the line into our line variable.
            while (( line = bf.readLine()) != null){
                // Increment the count and find the index of the word
                linecount++;
                int indexfound = line.indexOf(stringSearch);

                // If greater than -1, means we found the word
                if (indexfound > -1) {
                    System.out.println("Word was found at position " + indexfound + " on line " + linecount);
                }
            }

            // Close the file after done searching
            bf.close();
        }
        catch (IOException e) {
            System.out.println("IO Error Occurred: " + e.toString());
        }
    }
}

I want to find some word "the"in test.txt file. The problem is when I found the first "the", my program stops finding more.

我想在 test.txt 文件中找到一些单词“the”。问题是当我找到第一个"the" 时,我的程序停止寻找更多。

And when some word like "then"my program understand it as the word "the".

当像“then”这样的词时,我的程序将其理解为“the”这个词。

采纳答案by Chadwick

Use Regexes case insensitively, with word boundaries to find all instances and variations of "the".

不区分大小写地使用正则表达式,使用单词边界来查找“the”的所有实例和变体。

indexOf("the")can not discern between "the"and "then"since each starts with "the". Likewise, "the" is found in the middle of "anathema".

indexOf("the")无法区分“the”“then”,因为每个都以“the”开头。同样,“the”位于“anathema”的中间。

To avoid this, use regexes, and search for "the", with word boundaries (\b) on either side. Use word boundaries, instead of splitting on " ", or using just indexOf(" the ")(spaces on either side) which would not find "the."and other instances next to punctuation. You can also do your search case insensitively to find "The"as well.

为避免这种情况,请使用正则表达式并搜索“the”,并\b在任一侧使用单词边界 ( )。使用单词边界,而不是在“ ”上拆分,或者只使用indexOf(" the ")找不到“the”的(两边的空格)和标点符号旁边的其他实例。您也可以不区分大小写地进行搜索以找到“The”

Pattern p = Pattern.compile("\bthe\b", Pattern.CASE_INSENSITIVE);

while ( (line = bf.readLine()) != null) {
    linecount++;

    Matcher m = p.matcher(line);

    // indicate all matches on the line
    while (m.find()) {
        System.out.println("Word was found at position " + 
                       m.start() + " on line " + linecount);
    }
}

回答by vodkhang

You shouldn't use indexOf because it will find all the possible substring that you have in your string. And because "then" contains the string "the", so it is also a good substring.

您不应该使用 indexOf,因为它会找到您的字符串中所有可能的子字符串。并且因为“then”包含字符串“the”,所以它也是一个很好的子字符串。

More about indexOf

更多关于 indexOf

indexOf

public int indexOf(String str, int fromIndex) Returns the index within this string of the first occurrence of the specified substring, starting at the specified index. The integer returned is the smallest value k for which:

指数

public int indexOf(String str, int fromIndex) 返回此字符串中指定子字符串第一次出现的索引,从指定索引开始。返回的整数是最小值 k,其中:

You should separate the lines into many words and loop over each word and compare to "the".

您应该将这些行分成许多单词并遍历每个单词并与“the”进行比较。

String [] words = line.split(" ");
for (String word : words) {
  if (word.equals("the")) {
    System.out.println("Found the word");
  }
}

The above code snippet will also loop over all possible "the" in the line for you. Using indexOf will always returns you the first occurrence

上面的代码片段还将为您循环该行中所有可能的“the”。使用 indexOf 将始终返回第一次出现

回答by flash

You best should use Regular Expressionsfor this kind of search. As a easy/dirty workaround you could modify your stringSearch from

您最好使用正则表达式进行此类搜索。作为一种简单/肮脏的解决方法,您可以从

String stringSearch = "the";

to

String stringSearch = " the ";

回答by Michael Shimmins

Your current implementation will only find the first instance of 'the' per line.

您当前的实现只会找到每行 'the' 的第一个实例。

Consider splitting each line into words, iterating over the list of words, and comparing each word to 'the' instead:

考虑将每一行拆分为单词,遍历单词列表,并将每个单词与“the”进行比较:

while (( line = bf.readLine()) != null)
{
    linecount++;
    String[] words = line.split(" ");

    for (String word : words)
    {
        if(word.equals(stringSearch))
            System.out.println("Word was found at position " + indexfound + " on line " + linecount);
    }
}

回答by CurtainDog

It doesn't sound like the point of the exercise is to skill you up in regular expressions (I don't know it may be... but it seems a little basic for that), even though regexs would indeed be the real-world solution to things like this.

听起来这个练习的重点并不是让你熟练掌握正则表达式(我不知道它可能是......但它似乎有点基础),即使正则表达式确实是真正的 -像这样的事情的世界解决方案。

My advice is to focus on the basics, use index of and substring to test the string. Think about how you could account for the naturally case sensitive nature of strings. Also, does your reader always get closed (i.e. is there a way bf.close() wouldn't be executed)?

我的建议是专注于基础,使用 index of 和 substring 来测试字符串。想一想如何解释字符串自然区分大小写的性质。此外,您的读者是否总是被关闭(即有没有一种方法 bf.close() 不会被执行)?