Java Groovy:从文件中读取一系列行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4089754/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 11:30:27  来源:igfitidea点击:

Groovy: Reading a range of lines from file

javafile-iogroovy

提问by Robert Strauch

I have a text file with a rather large amount of data of about 2,000,000 lines. Going through the file with the following code snippet is easy but that's not what I need ;-)

我有一个文本文件,其中包含大约 2,000,000 行的大量数据。使用以下代码片段浏览文件很容易,但这不是我需要的 ;-)

def f = new File("input.txt")
f.eachLine() {
    // Some code here
}

I need to read only a specific range of lines from the file. Is there a way to specify the start and end line like this (pseudo-code)? I'd like to avoid loading all lines into memory with readLines() before selecting the range.

我只需要从文件中读取特定范围的行。有没有办法像这样指定开始和结束行(伪代码)?在选择范围之前,我想避免使用 readLines() 将所有行加载到内存中。

// Read all lines from 4 to 48
def f = new File("input.txt")
def start = 4
def end = 48
f.eachLine(start, end) {
    // Some code here
}

If this is not possible with Groovy any Java solution is welcome as well :-)

如果 Groovy 无法做到这一点,也欢迎任何 Java 解决方案:-)

Cheers, Robert

干杯,罗伯特

采纳答案by Yevgeniy Brikman

I don't believe there is any "magic" way to skip to an arbitrary "line" in a file. Lines are merely defined by newline characters, so without actually reading the file, there is no way to know where those will be. I believe you have two options:

我不相信有任何“神奇”的方式可以跳到文件中的任意“行”。行仅由换行符定义,因此如果不实际读取文件,就无法知道它们的位置。我相信你有两个选择:

  1. Follow Mark Peter's answer and use a BufferedReaderto read the file in one line at a time until you reach your desired line. This will obviously be slow.
  2. Figure out how many bytes(rather than lines) your next read needs to start at and seek directly to that point in the file using something like RandomAccessFile. Whether or not it's possible to efficiently know the right number of bytes depends on your application. For example, if you are reading the file sequentially, one piece at a time, you simply record the position you left off at. If all the lines are of a fixed length L bytes, then getting to line N is just a matter of seeking to position N*L. If this is an operation you repeat often, some pre-processing might help: for example, read the entire file once and record the starting position of each line in an in-memory HashMap. Next time you need to go to line N, simply look up it's position in the HashMap and seek directly to that point.
  1. 按照 Mark Peter 的回答并使用BufferedReader 一次读取一行文件,直到到达所需的行。这显然会很慢。
  2. 弄清楚您的下一次读取需要从多少字节(而不是行)开始,并使用RandomAccessFile 之类的东西直接查找到文件中的那个点. 是否可以有效地知道正确的字节数取决于您的应用程序。例如,如果您按顺序读取文件,一次一个,您只需记录您离开的位置。如果所有行的长度都是固定的 L 字节,那么到达第 N 行只是寻找位置 N*L 的问题。如果这是您经常重复的操作,一些预处理可能会有所帮助:例如,读取整个文件一次并在内存中的 HashMap 中记录每一行的起始位置。下次您需要转到第 N 行时,只需在 HashMap 中查找它的位置并直接查找到该点。

回答by Mark Peters

The Java solution:

Java解决方案:

BufferedReader r = new BufferedReader(new FileReader(f));
String line;
for ( int ln = 0; (line = r.readLine()) != null && ln <= end; ln++ ) {
    if ( ln >= start ) {
        //Some code here
    }
}

Gross, eh?

很恶心吧?

Unfortunately unless your lines are fixed length, you're not going to be able to skip to the startth line efficiently since each line could be arbitrarily long and therefore all data needs to be read. That doesn't preclude a nicersolution though.

不幸的是,除非您的行是固定长度的,否则您将无法有效地跳到start第 th 行,因为每行可能是任意长的,因此需要读取所有数据。但这并不排除更好的解决方案。

Java 8

爪哇 8

Thought it was worth an update to show how to do this efficiently with Streams:

认为值得更新以展示如何使用 Streams 有效地做到这一点:

int start = 5;
int end = 12;
Path file = Paths.get("/tmp/bigfile.txt");

try (Stream<String> lines = Files.lines(file)) {
    lines.skip(start).limit(end-start).forEach(System.out::println);
}

Because Streams are lazily evaluated, it will only read lines up to and including end(plus whatever internal buffering it chooses to do).

因为 Streams 是惰性求值的,所以它只会读取直到和包含的行end(加上它选择做的任何内部缓冲)。

回答by Sean Patrick Floyd

Here's another Java solution using LineIteratorand FileUtilsfrom Commons / IO:

下面是使用另一种Java解决方案LineIterator文件实用程序共享/ IO

public static Collection<String> readFile(final File f,
    final int startOffset,
    final int lines) throws IOException{
    final LineIterator it = FileUtils.lineIterator(f);
    int index = 0;
    final Collection<String> coll = new ArrayList<String>(lines);
    while(index++ < startOffset + lines && it.hasNext()){
        final String line = it.nextLine();
        if(index >= startOffset){
            coll.add(line);
        }
    }
    it.close();
    return coll;
}

回答by dogbane

You have to iterate over the lines from the beginning to get to your starting position, but you can use LineNumberReader(instead of BufferedReader) because it will keep track of the line numbers for you.

您必须从头开始遍历行才能到达起始位置,但您可以使用LineNumberReader(而不是BufferedReader),因为它会为您跟踪行号。

    final int start = 4;
    final int end = 48;

    final LineNumberReader in = new LineNumberReader(new FileReader(filename));
    String line=null;
    while ((line = in.readLine()) != null && in.getLineNumber() <= end) {
        if (in.getLineNumber() >= start) {
            //process line
        }
    }

回答by Robert Strauch

Thanks for all your hints. From what you've written I cobbled my own piece of code which seems to be working. Not elegant but it serves its purpose :-)

感谢您的所有提示。根据您所写的内容,我拼凑了自己的一段似乎正在运行的代码。不优雅,但它达到了它的目的:-)

def f = new RandomAccessFile("D:/input.txt", "r")
def start = 3
def end = 6
def current = start-1
def BYTE_OFFSET = 11
def resultList = []

if ((end*BYTE_OFFSET) <= f.length()) {
    while ((current*BYTE_OFFSET) < (end*BYTE_OFFSET)) {
        f.seek(current*BYTE_OFFSET)
        resultList << f.readLine()
        current++
    }
}

回答by Dónal

Here's a Groovy solution. Unfortunately, this will read every line of the file after start

这是一个 Groovy 解决方案。不幸的是,这将读取文件的每一行start

def start = 4
def end = 48

new File("input.txt").eachLine(start) {lineNo, line ->

    if (lineNo <= end) {
        // Process the line
    }
}

回答by Vinay

This should do it. I believe this doesn't read any line after "end".

这应该这样做。我相信这在“结束”之后不会读取任何行。

def readRange = {file ->
    def start = 10
    def end = 20
    def fileToRead = new File(file)
    fileToRead.eachLine{line, lineNo = 0 ->
        lineNo++
        if(lineNo > end) {
            return
        }
        if(lineNo >= start) {
            println line                
        }            
    }
}

回答by Jarek Przygódzki

In Groovy you can use Category

在 Groovy 中,您可以使用Category

class FileHelper {
    static eachLineInRange(File file, IntRange lineRange, Closure closure) {
        file.withReader { r->
            def line
            for(; (line = r.readLine()) != null;) {
                def lineNo = r.lineNumber
                if(lineNo < lineRange.from) continue
                if(lineNo > lineRange.to) break
                closure.call(line, lineNo)
            }
        }
    }
}

def f = '/path/to/file' as File
use(FileHelper) {
    f.eachLineInRange(from..to){line, lineNo ->
        println "$lineNo) $line"
    }
}

or ExpandoMetaClass

ExpandoMetaClass

File.metaClass.eachLineInRange = { IntRange lineRange, Closure closure ->
    delegate.withReader { r->
        def line
        for(; (line = r.readLine()) != null;) {
            def lineNo = r.lineNumber
            if(lineNo < lineRange.from) continue
            if(lineNo > lineRange.to) break
            closure.call(line, lineNo)
        }
    }
}

def f = '/path/to/file' as File
f.eachLineInRange(from..to){line, lineNo ->
    println "$lineNo) $line"
}

In this solution you read each line from file sequentially but don't keep them all in memory.

在此解决方案中,您按顺序从文件中读取每一行,但不要将它们全部保存在内存中。

回答by Gangnus

Groovy has the possibility to start from some special line now. Here are two citations from docs on File

Groovy 现在有可能从一些特殊的行开始。这是来自文件上的文档的两个引文

Object eachLine(int firstLine, Closure closure) 

Object eachLine(String charset, int firstLine, Closure closure)