Java 在 .txt 文件中查找所有字符串“the”

Question

提问by Giffary

Here is my code:

这是我的代码：

// Import io so we can use file objects
import java.io.*;

public class SearchThe {
    public static void main(String args[]) {
        try {
            String stringSearch = "the";
            // Open the file c:\test.txt as a buffered reader
            BufferedReader bf = new BufferedReader(new FileReader("test.txt"));

            // Start a line count and declare a string to hold our current line.
            int linecount = 0;
                String line;

            // Let the user know what we are searching for
            System.out.println("Searching for " + stringSearch + " in file...");

            // Loop through each line, stashing the line into our line variable.
            while (( line = bf.readLine()) != null){
                // Increment the count and find the index of the word
                linecount++;
                int indexfound = line.indexOf(stringSearch);

                // If greater than -1, means we found the word
                if (indexfound > -1) {
                    System.out.println("Word was found at position " + indexfound + " on line " + linecount);
                }
            }

            // Close the file after done searching
            bf.close();
        }
        catch (IOException e) {
            System.out.println("IO Error Occurred: " + e.toString());
        }
    }
}

I want to find some word "the"in test.txt file. The problem is when I found the first "the", my program stops finding more.

我想在 test.txt 文件中找到一些单词“the”。问题是当我找到第一个"the" 时，我的程序停止寻找更多。

And when some word like "then"my program understand it as the word "the".

当像“then”这样的词时，我的程序将其理解为“the”这个词。

Answer 1

采纳答案by Chadwick

Use Regexes case insensitively, with word boundaries to find all instances and variations of "the".

不区分大小写地使用正则表达式，使用单词边界来查找“the”的所有实例和变体。

indexOf("the")can not discern between "the"and "then"since each starts with "the". Likewise, "the" is found in the middle of "anathema".

indexOf("the")无法区分“the”和“then”，因为每个都以“the”开头。同样，“the”位于“anathema”的中间。

To avoid this, use regexes, and search for "the", with word boundaries (\b) on either side. Use word boundaries, instead of splitting on " ", or using just indexOf(" the ")(spaces on either side) which would not find "the."and other instances next to punctuation. You can also do your search case insensitively to find "The"as well.

为避免这种情况，请使用正则表达式并搜索“the”，并\b在任一侧使用单词边界 ( )。使用单词边界，而不是在“ ”上拆分，或者只使用indexOf(" the ")找不到“the”的（两边的空格）。和标点符号旁边的其他实例。您也可以不区分大小写地进行搜索以找到“The”。

Pattern p = Pattern.compile("\bthe\b", Pattern.CASE_INSENSITIVE);

while ( (line = bf.readLine()) != null) {
    linecount++;

    Matcher m = p.matcher(line);

    // indicate all matches on the line
    while (m.find()) {
        System.out.println("Word was found at position " + 
                       m.start() + " on line " + linecount);
    }
}

Answer 2

回答by vodkhang

You shouldn't use indexOf because it will find all the possible substring that you have in your string. And because "then" contains the string "the", so it is also a good substring.

您不应该使用 indexOf，因为它会找到您的字符串中所有可能的子字符串。并且因为“then”包含字符串“the”，所以它也是一个很好的子字符串。

More about indexOf

更多关于 indexOf

indexOf
public int indexOf(String str, int fromIndex) Returns the index within this string of the first occurrence of the specified substring, starting at the specified index. The integer returned is the smallest value k for which:

指数
public int indexOf(String str, int fromIndex) 返回此字符串中指定子字符串第一次出现的索引，从指定索引开始。返回的整数是最小值 k，其中：

You should separate the lines into many words and loop over each word and compare to "the".

您应该将这些行分成许多单词并遍历每个单词并与“the”进行比较。

String [] words = line.split(" ");
for (String word : words) {
  if (word.equals("the")) {
    System.out.println("Found the word");
  }
}

The above code snippet will also loop over all possible "the" in the line for you. Using indexOf will always returns you the first occurrence

上面的代码片段还将为您循环该行中所有可能的“the”。使用 indexOf 将始终返回第一次出现

Answer 3

回答by flash

You best should use Regular Expressionsfor this kind of search. As a easy/dirty workaround you could modify your stringSearch from

您最好使用正则表达式进行此类搜索。作为一种简单/肮脏的解决方法，您可以从

String stringSearch = "the";

to

到

String stringSearch = " the ";

Answer 4

回答by Michael Shimmins

Your current implementation will only find the first instance of 'the' per line.

您当前的实现只会找到每行 'the' 的第一个实例。

Consider splitting each line into words, iterating over the list of words, and comparing each word to 'the' instead:

考虑将每一行拆分为单词，遍历单词列表，并将每个单词与“the”进行比较：

while (( line = bf.readLine()) != null)
{
    linecount++;
    String[] words = line.split(" ");

    for (String word : words)
    {
        if(word.equals(stringSearch))
            System.out.println("Word was found at position " + indexfound + " on line " + linecount);
    }
}

Answer 5

回答by CurtainDog

It doesn't sound like the point of the exercise is to skill you up in regular expressions (I don't know it may be... but it seems a little basic for that), even though regexs would indeed be the real-world solution to things like this.

听起来这个练习的重点并不是让你熟练掌握正则表达式（我不知道它可能是......但它似乎有点基础），即使正则表达式确实是真正的 -像这样的事情的世界解决方案。

My advice is to focus on the basics, use index of and substring to test the string. Think about how you could account for the naturally case sensitive nature of strings. Also, does your reader always get closed (i.e. is there a way bf.close() wouldn't be executed)?

我的建议是专注于基础，使用 index of 和 substring 来测试字符串。想一想如何解释字符串自然区分大小写的性质。此外，您的读者是否总是被关闭（即有没有一种方法 bf.close() 不会被执行）？

Java 在 .txt 文件中查找所有字符串“the”

提问by Giffary

采纳答案by Chadwick

回答by vodkhang

回答by flash

回答by Michael Shimmins

回答by CurtainDog

相关推荐

最近更新

标签

Java 在 .txt 文件中查找所有字符串“the”

提问by Giffary

采纳答案by Chadwick

回答by vodkhang

回答by flash

回答by Michael Shimmins

回答by CurtainDog

相关推荐

如何在java中验证语言环境？

Java 为什么在显式调用构造函数时不能引用实例方法？

Java keytool 从 url/port 添加服务器证书的简单方法

Java 向没有使用过“类”和“对象”的人定义“类”和“对象”这两个词的最佳方法是什么？

相关推荐

最近更新

标签