java 使用 FileChannel 和 ByteArrays 读取 ASCII 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/93423/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 11:01:47  来源:igfitidea点击:

Reading an ASCII file with FileChannel and ByteArrays

javafile-ioiobytearrayfilechannel

提问by Jake

I have the following code:

我有以下代码:

        String inputFile = "somefile.txt";
        FileInputStream in = new FileInputStream(inputFile);
        FileChannel ch = in.getChannel();
        ByteBuffer buf = ByteBuffer.allocateDirect(BUFSIZE);  // BUFSIZE = 256

        /* read the file into a buffer, 256 bytes at a time */
        int rd;
        while ( (rd = ch.read( buf )) != -1 ) {
            buf.rewind();
            for ( int i = 0; i < rd/2; i++ ) {
                /* print each character */
                System.out.print(buf.getChar());
            }
            buf.clear();
        }

But the characters get displayed at ?'s. Does this have something to do with Java using Unicode characters? How do I correct this?

但是字符会显示在 ? 处。这与使用 Unicode 字符的 Java 有关系吗?我该如何纠正?

回答by jliszka

You have to know what the encoding of the file is, and then decode the ByteBuffer into a CharBuffer using that encoding. Assuming the file is ASCII:

您必须知道文件的编码是什么,然后使用该编码将 ByteBuffer 解码为 CharBuffer。假设文件是​​ ASCII:

import java.util.*;
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;

public class Buffer
{
    public static void main(String args[]) throws Exception
    {
        String inputFile = "somefile";
        FileInputStream in = new FileInputStream(inputFile);
        FileChannel ch = in.getChannel();
        ByteBuffer buf = ByteBuffer.allocateDirect(BUFSIZE);  // BUFSIZE = 256

        Charset cs = Charset.forName("ASCII"); // Or whatever encoding you want

        /* read the file into a buffer, 256 bytes at a time */
        int rd;
        while ( (rd = ch.read( buf )) != -1 ) {
            buf.rewind();
            CharBuffer chbuf = cs.decode(buf);
            for ( int i = 0; i < chbuf.length(); i++ ) {
                /* print each character */
                System.out.print(chbuf.get());
            }
            buf.clear();
        }
    }
}

回答by Craig Day

buf.getChar() is expecting 2 bytes per character but you are only storing 1. Use:

buf.getChar() 每个字符需要 2 个字节,但您只存储 1 个。使用:

 System.out.print((char) buf.get());

回答by Robert J. Walker

Depending on the encoding of somefile.txt, a character may not actually be composed of two bytes. This pagegives more information about how to read streams with the proper encoding.

根据 somefile.txt 的编码,一个字符实际上可能并不由两个字节组成。此页面提供了有关如何使用正确编码读取流的更多信息。

The bummer is, the file system doesn't tell you the encoding of the file, because it doesn't know. As far as it's concerned, it's just a bunch of bytes. You must either find some way to communicate the encoding to the program, detect it somehow, or (if possible) always ensure that the encoding is the same (such as UTF-8).

糟糕的是,文件系统不会告诉您文件的编码,因为它不知道。就它而言,它只是一堆字节。您必须找到某种方式将编码传达给程序,以某种方式检测它,或者(如果可能)始终确保编码相同(例如 UTF-8)。

回答by jjnguy

Changing your print statement to:

将您的打印语句更改为:

System.out.print((char)buf.get());

Seems to help.

似乎有帮助。

回答by jjnguy

Is there a particular reason why you are reading the file in the way that you do?

您以这种方式阅读文件有什么特别的原因吗?

If you're reading in an ASCII file you should really be using a Reader.

如果您正在阅读 ASCII 文件,则您确实应该使用阅读器。

I would do it something like:

我会这样做:

File inputFile = new File("somefile.txt");
BufferedReader reader = new BufferedReader(new FileReader(inputFile));

And then use either readLineor similar to actually read in the data!

然后使用任一readLine或类似的方式实际读入数据!

回答by Burkhard

Yes, it is Unicode.

是的,它是 Unicode。

If you have 14 Chars in your File, you only get 7 '?'.

如果您的文件中有 14 个字符,您只会得到 7 个“?”。

Solution pending. Still thinking.

解决方案待定。仍然在想。