java 如何在java中标记输入文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6804713/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 17:20:47  来源:igfitidea点击:

How to tokenize an input file in java

javatokenize

提问by syakirah ibrahim

i'm doing tokenizing a text file in java. I want to read an input file, tokenize it and write a certain character that has been tokenized into an output file. This is what i've done so far:

我正在用 java 标记一个文本文件。我想读取一个输入文件,对其进行标记并将已标记的某个字符写入输出文件。这是我到目前为止所做的:

package org.apache.lucene.analysis;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.StreamTokenizer;

class StringProcessing {
    // Create BufferedReader class instance
    public static void main(String[] args) throws IOException {
        InputStreamReader input = new InputStreamReader(System.in);
        BufferedReader keyboardInput = new BufferedReader(input);
        System.out.print("Please enter a java file name: ");
        String filename = keyboardInput.readLine();
        if (!filename.endsWith(".DAT")) {
            System.out.println("This is not a DAT file.");
            System.exit(0);
        }
        File File = new File(filename);
        if (File.exists()) {
            FileReader file = new FileReader(filename);
            StreamTokenizer streamTokenizer = new StreamTokenizer(file);
            int i = 0;
            int numberOfTokensGenerated = 0;
            while (i != StreamTokenizer.TT_EOF) {
                i = streamTokenizer.nextToken();
                numberOfTokensGenerated++;
            }
            // Output number of characters in the line
            System.out.println("Number of tokens = " + numberOfTokensGenerated);
            // Output tokens
            for (int counter = 0; counter < numberOfTokensGenerated; counter++) {
                char character = file.toString().charAt(counter);
                if (character == ' ') { System.out.println(); } else { System.out.print(character); }
            }
        } else {
            System.out.println("File does not exist!");
            System.exit(0);
        }

        System.out.println("\n");
    }//end main
}//end class

When i run this code, this is what i get:

当我运行此代码时,这就是我得到的:

Please enter a java file name: D://eclipse-java-helios-SR1-win32/LexractData.DAT Number of tokens = 129 java.io.FileReader@19821fException in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 25 at java.lang.String.charAt(Unknown Source) at org.apache.lucene.analysis.StringProcessing.main(StringProcessing.java:40)

Please enter a java file name: D://eclipse-java-helios-SR1-win32/LexractData.DAT Number of tokens = 129 java.io.FileReader@19821fException in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 25 at java.lang.String.charAt(Unknown Source) at org.apache.lucene.analysis.StringProcessing.main(StringProcessing.java:40)

The input file will look like this:

输入文件将如下所示:

-K1 Account 
--Op1 withdraw
---Param1 an
----Type Int
---Param2 amount
----Type Int
--Op2 deposit
---Param1 an
----Type Int
---Param2 Amount
----Type Int
--CA1 acNo
---Type Int
-K2 CheckAccount 
--SC Account
--CA1 credit_limit
---Type Int
-K3 Customer
--CA1 name
---Type String
-K4 Transaction
--CA1 date
---Type Date
--CA2 time
---Type Time
-K5 CheckBook
-K6 Check
-K7 BalanceAccount
--SC Account

I just want to read the string which are starts with -K1, -K2, -K3, and so on... can anyone help me?

我只想阅读以-K1, -K2, -K3, 等开头的字符串……有人可以帮我吗?

回答by Kal

The problem is with this line --

问题出在这一行——

char character = file.toString().charAt(counter);

fileis a reference to a FileReaderthat does not implement toString().. it calls Object.toString()which prints a reference around 25 characters long. Thats why your exception says OutofBoundsException at the 26th character.

file是对FileReader未实现toString()..的引用,它调用Object.toString()打印大约 25 个字符长的引用。这就是为什么您的异常在第 26 个字符处说 OutofBoundsException 。

To read the file correctly, you should wrap your filereader with a bufferedreader and then put each readline into a stringbuffer.

要正确读取文件,您应该用一个缓冲读取器包装您的文件读取器,然后将每个读取行放入一个字符串缓冲区中。

FileReader fr = new FileReader(filename);
BufferedReader br = new BufferedReader(fr);
StringBuilder sb  = new StringBuilder();
String s;
while((s = br.readLine()) != null) {
sb.append(s);
} 

// Now use sb.toString() instead of file.toString()

// Now use sb.toString() instead of file.toString()

回答by lpreams

If you are wanting to tokenize the input file then the obvious choice is to use a Scanner. The Scanner class reads a given input stream, and can output either tokens or other scanned types (scanner.nextInt(), scanner.nextLine(), etc).

如果您想标记输入文件,那么显而易见的选择是使用扫描仪。Scanner 类读取给定的输入流,并可以输出标记或其他扫描类型(scanner.nextInt()、scanner.nextLine() 等)。

import java.util.Scanner;
import java.io.File;
import java.io.IOException;
public static void main(String[] args) throws IOException {
    Scanner in = new Scanner(new File("filename.dat"));
    while (in.hasNext) {
        String s = in.next(); //get the next token in the file
        // Now s contains a token from the file
    }
}

Check out Oracle's documentation of the Scanner classfor more info.

查看Oracle 的 Scanner 类文档以获取更多信息。