在 Java 中处理文件指针的有效方法?(使用 BufferedReader 和文件指针)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1575087/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 17:08:44  来源:igfitidea点击:

Efficient way of handling file pointers in Java? (Using BufferedReader with file pointer)

javafilepointersbuffered

提问by Sudheer

I have a log file which gets updated every second. I need to read the log file periodically, and once I do a read, I need to store the file pointer position at the end of the last line I read and in the next periodic read I should start from that point.

我有一个日志文件,每秒更新一次。我需要定期读取日志文件,一旦读取,我需要将文件指针位置存储在我读取的最后一行的末尾,并且在下一次定期读取时,我应该从该点开始。

Currently, I am using a random access file in Java and using the getFilePointer()method to get he offset value and the seek()method to go to the offset position.

目前,我在Java中使用随机访问文件并使用getFilePointer()获取偏移值的seek()方法和转到偏移位置的方法。

However, I have read in most articles and even the Java doc recommendations to use BufferredReaderfor efficient reading of a file. How can I achieve this (getting the filepointer and moving to the last line) using a BufferedReader, or is there any other efficient way to achieve this task?

但是,我已经阅读了大多数文章,甚至是 Java 文档建议以BufferredReader用于有效读取文件。我如何使用 a 来实现这一点(获取文件指针并移至最后一行)BufferedReader,或者是否有其他有效的方法来实现此任务?

回答by Neil Coffey

A couple of ways that should work:

应该工作的几种方法:

  • open the file using a FileInputStream, skip() the relevant number of bytes, then wrap the BufferedReader around the stream (via an InputStreamReader);
  • open the file (with either FileInputStream or RandomAccessFile), call getChannel() on the stream/RandomAccessFile to get an underlying FileChannel, call position() on the channel, then call Channels.newInputStream() to get an input stream from the channel, which you can pass to InputStreamReader -> BufferedReader.
  • 使用 FileInputStream 打开文件,skip() 相关字节数,然后将 BufferedReader 包裹在流中(通过 InputStreamReader);
  • 打开文件(使用 FileInputStream 或 RandomAccessFile),在流/RandomAccessFile 上调用 getChannel() 以获取底层 FileChannel,在通道上调用 position(),然后调用 Channels.newInputStream() 从通道中获取输入流,您可以将其传递给 InputStreamReader -> BufferedReader。

I haven't honestly profiled these to see which is better performance-wise, but you should see which works better in your situation.

我没有诚实地对这些进行分析,以查看哪个在性能方面更好,但您应该看看哪个在您的情况下效果更好。

The problem with RandomAccessFile is essentially that its readLine() method is very inefficient. If it's convenient for you to read from the RAF and do your own buffering to split the lines, then there's nothing wrong with RAF per se-- just that its readLine() is poorly implemented

RandomAccessFile 的问题本质上是它的 readLine() 方法非常低效。如果您可以方便地从 RAF 中读取数据并自行缓冲以拆分行,那么 RAF 本身并没有什么问题——只是它的 readLine() 实现得很差

回答by srikanth yaradla

Neil Coffey's solution is good if you are reading fixed length files. However for files that have variable length (data keep coming in) there are some problems with using BufferedReader directly on FileInputStream or FileChannel inputstream via an InputStreamReader. For ex consider the cases

如果您正在阅读固定长度的文件,Neil Coffey 的解决方案很好。但是,对于长度可变的文件(数据不断传入),通过 InputStreamReader 直接在 FileInputStream 或 FileChannel 输入流上使用 BufferedReader 存在一些问题。对于前考虑的情况

  • 1) You want to read data from some offset to current file length. So you use BR on FileInputStream/FileChannel(via an InputStreamReader) and use its readLine method. But while you are busy reading the data let say some data got added which causes BF's readLine to read more data than what you expected(the previous file length)

  • 2) You finished readLine stuff but when you try to read the current file length/channel position some data got added suddenly which causes the current file length/channel position to increase but you have already read less data than this.

  • 1)您想从某个偏移量读取数据到当前文件长度。所以你在 FileInputStream/FileChannel 上使用 BR(通过 InputStreamReader)并使用它的 readLine 方法。但是,当您忙于读取数据时,假设添加了一些数据,这导致 BF 的 readLine 读取的数据比您预期的要多(先前的文件长度)

  • 2)您完成了 readLine 的内容,但是当您尝试读取当前文件长度/通道位置时,突然添加了一些数据,这导致当前文件长度/通道位置增加,但您已经读取的数据少于此数量。

In both of the above cases it is difficult to know the actual data you have read (you cannot just use the length of data read using readLine because it skips some chars like carriage return)

在上述两种情况下,很难知道您读取的实际数据(您不能只使用使用 readLine 读取的数据长度,因为它会跳过一些字符,如回车)

So it is better to read the data in buffered bytes and use a BufferedReader wrapper around this. I wrote some methods like this

因此,最好以缓冲字节读取数据并在此周围使用 BufferedReader 包装器。我写了一些这样的方法

/** Read data from offset to length bytes in RandomAccessFile using BufferedReader
 * @param offset
 * @param length
 * @param accessFile
 * @throws IOException
 */
    public static void readBufferedLines(long offset, long length, RandomAccessFile accessFile) throws IOException{
    if(accessFile == null) return;
    int bufferSize = BYTE_BUFFER_SIZE;// constant say 4096

    if(offset < length && offset >= 0){ 
        int index = 1;
        long curPosition = offset;
        /*
         * iterate (length-from)/BYTE_BUFFER_SIZE times to read into buffer no matter where new line occurs
         */
        while((curPosition + (index * BYTE_BUFFER_SIZE)) <  length){        

            accessFile.seek(offset); // seek to last parsed data rather than last data read in to buffer

            byte[] buf = new byte[bufferSize];
            int read = accessFile.read(buf, 0, bufferSize);
            index++;// Increment whether or not read successful

            if(read > 0){

                int lastnewLine = getLastLine(read,buf);

                if(lastnewLine <= 0){ // no new line found in the buffer reset buffer size and continue
                    bufferSize = bufferSize+read;
                    continue;

                }
                else{
                    bufferSize = BYTE_BUFFER_SIZE;
                }

                readLine(buf, 0, lastnewLine); // read the lines from buffer and parse the line

                offset = offset+lastnewLine; // update the last data read

            }

        }



        // Read last chunk. The last chunk size in worst case is the total file when no newline occurs 
        if(offset < length){

            accessFile.seek(offset); 
            byte[] buf = new byte[(int) (length-offset)];
            int read = accessFile.read(buf, 0, buf.length);

            if(read > 0){

                readLine(buf, 0, read);

                offset = offset+read; // update the last data read


            }
        }


    }

}

private static void readLine(byte[] buf, int from , int lastnewLine) throws IOException{

    String readLine = "";
    BufferedReader reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(buf,from,lastnewLine) ));
    while( (readLine =  reader.readLine()) != null){
        //do something with readLine
        System.out.println(readLine);
    }
    reader.close();
}


private static int getLastLine(int read, byte[] buf) {
    if(buf == null ) return -1;
    if(read > buf.length) read = buf.length;
    while( read > 0 && !(buf[read-1] == '\n' || buf[read-1] == '\r')) read--;       
    return read;
}   
 public static void main(String[] args) throws IOException {
    RandomAccessFile accessFile = new RandomAccessFile("C:/sri/test.log",    "r");
    readBufferedLines(0, accessFile.length(), accessFile);
    accessFile.close();

}

回答by dividebyzero

I had a similar problem, and I created this class to take lines from BufferedStream, and count how many bytes you have read so far by using getBytes(). We assume the line separator has a single byte by default, and we re-instance the BufferedReaderfor seek()to work.

我有一个类似的问题,我创建了这个类来从 BufferedStream 中获取行,并使用getBytes(). 我们假设行分隔符默认只有一个字节,我们重新实例化BufferedReaderforseek()工作。

public class FileCounterIterator {

    public Long position() {
        return _position;
    }

    public Long fileSize() {
        return _fileSize;
    }

    public FileCounterIterator newlineLength(Long newNewlineLength) {
        this._newlineLength = newNewlineLength;
        return this;
    }

    private Long _fileSize = 0L;
    private Long _position = 0L;
    private Long _newlineLength = 1L;
    private RandomAccessFile fp;
    private BufferedReader itr;

    public FileCounterIterator(String filename) throws IOException {
        fp = new RandomAccessFile(filename, "r");
        _fileSize = fp.length();
        this.seek(0L);
    }

    public FileCounterIterator seek(Long newPosition) throws IOException {
        this.fp.seek(newPosition);
        this._position = newPosition;
        itr = new BufferedReader(new InputStreamReader(new FileInputStream(fp.getFD())));
        return this;
    }

    public Boolean hasNext() throws IOException {
        return this._position < this._fileSize;
    }

    public String readLine() throws IOException {
        String nextLine = itr.readLine();
        this._position += nextLine.getBytes().length + _newlineLength;
        return nextLine;
    }
}