string 在 Go 中逐行读取文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8757389/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 01:20:13  来源:igfitidea点击:

Reading a file line by line in Go

stringfileparsinggoline

提问by g06lin

I'm unable to find file.ReadLinefunction in Go. I can figure out how to quickly write one, but I am just wondering if I'm overlooking something here. How does one read a file line by line?

我无法file.ReadLine在 Go 中找到函数。我可以弄清楚如何快速写一个,但我只是想知道我是否在这里忽略了一些东西。如何逐行读取文件?

采纳答案by g06lin

NOTE:The accepted answer was correct in early versions of Go. See the highest voted answercontains the more recent idiomatic way to achieve this.

注意:接受的答案在 Go 的早期版本中是正确的。 查看投票最高的答案包含实现此目的的最新惯用方法。

There is function ReadLinein package bufio.

包中有函数ReadLinebufio

Please note that if the line does not fit into the read buffer, the function will return an incomplete line. If you want to always read a whole line in your program by a single call to a function, you will need to encapsulate the ReadLinefunction into your own function which calls ReadLinein a for-loop.

请注意,如果该行不适合读取缓冲区,该函数将返回一个不完整的行。如果您希望始终通过对函数的单次调用来读取程序中的整行,则需要将该ReadLine函数封装到您自己的ReadLine在 for 循环中调用的函数中。

bufio.ReadString('\n')isn't fully equivalent to ReadLinebecause ReadStringis unable to handle the case when the last line of a file does not end with the newline character.

bufio.ReadString('\n')不完全等同于ReadLine因为ReadString无法处理文件的最后一行不以换行符结尾的情况。

回答by Stefan Arentz

In Go 1.1 and newer the most simple way to do this is with a bufio.Scanner. Here is a simple example that reads lines from a file:

在 Go 1.1 和更新版本中,最简单的方法是使用bufio.Scanner. 这是一个从文件中读取行的简单示例:

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"
)

func main() {
    file, err := os.Open("/path/to/file.txt")
    if err != nil {
        log.Fatal(err)
    }
    defer file.Close()

    scanner := bufio.NewScanner(file)
    for scanner.Scan() {
        fmt.Println(scanner.Text())
    }

    if err := scanner.Err(); err != nil {
        log.Fatal(err)
    }
}

This is the cleanest way to read from a Readerline by line.

这是Reader逐行读取的最干净的方式。

There is one caveat: Scanner does not deal well with lines longer than 65536 characters. If that is an issue for you then then you should probably roll your own on top of Reader.Read().

有一个警告:扫描仪不能很好地处理超过 65536 个字符的行。如果这对您来说是个问题,那么您可能应该在Reader.Read().

回答by a-h

Use:

用:

  • reader.ReadString('\n')
    • If you don't mind that the line could be very long (i.e. use a lot of RAM). It keeps the \nat the end of the string returned.
  • reader.ReadLine()
    • If you care about limiting RAM consumption and don't mind the extra work of handling the case where the line is greater than the reader's buffer size.
  • reader.ReadString('\n')
    • 如果您不介意线路可能很长(即使用大量 RAM)。它将 保留\n在返回的字符串的末尾。
  • reader.ReadLine()
    • 如果您关心限制 RAM 消耗并且不介意处理行大于读取器缓冲区大小的情况的额外工作。

I tested the various solutions suggested by writing a program to test the scenarios which are identified as problems in other answers:

我测试了通过编写程序来测试在其他答案中被识别为问题的场景所建议的各种解决方案:

  • A file with a 4MB line.
  • A file which doesn't end with a line break.
  • 一个 4MB 行的文件。
  • 不以换行符结尾的文件。

I found that:

我找到:

  • The Scannersolution does not handle long lines.
  • The ReadLinesolution is complex to implement.
  • The ReadStringsolution is the simplest and works for long lines.
  • Scanner解决方案不处理长行。
  • ReadLine解决方案实施起来很复杂。
  • ReadString解决方案是最简单,适用于大排长龙。

Here is code which demonstrates each solution, it can be run via go run main.go:

这是演示每个解决方案的代码,它可以通过go run main.go以下方式运行:

package main

import (
    "bufio"
    "bytes"
    "fmt"
    "io"
    "os"
)

func readFileWithReadString(fn string) (err error) {
    fmt.Println("readFileWithReadString")

    file, err := os.Open(fn)
    defer file.Close()

    if err != nil {
        return err
    }

    // Start reading from the file with a reader.
    reader := bufio.NewReader(file)

    var line string
    for {
        line, err = reader.ReadString('\n')

        fmt.Printf(" > Read %d characters\n", len(line))

        // Process the line here.
        fmt.Println(" > > " + limitLength(line, 50))

        if err != nil {
            break
        }
    }

    if err != io.EOF {
        fmt.Printf(" > Failed!: %v\n", err)
    }

    return
}

func readFileWithScanner(fn string) (err error) {
    fmt.Println("readFileWithScanner - this will fail!")

    // Don't use this, it doesn't work with long lines...

    file, err := os.Open(fn)
    defer file.Close()

    if err != nil {
        return err
    }

    // Start reading from the file using a scanner.
    scanner := bufio.NewScanner(file)

    for scanner.Scan() {
        line := scanner.Text()

        fmt.Printf(" > Read %d characters\n", len(line))

        // Process the line here.
        fmt.Println(" > > " + limitLength(line, 50))
    }

    if scanner.Err() != nil {
        fmt.Printf(" > Failed!: %v\n", scanner.Err())
    }

    return
}

func readFileWithReadLine(fn string) (err error) {
    fmt.Println("readFileWithReadLine")

    file, err := os.Open(fn)
    defer file.Close()

    if err != nil {
        return err
    }

    // Start reading from the file with a reader.
    reader := bufio.NewReader(file)

    for {
        var buffer bytes.Buffer

        var l []byte
        var isPrefix bool
        for {
            l, isPrefix, err = reader.ReadLine()
            buffer.Write(l)

            // If we've reached the end of the line, stop reading.
            if !isPrefix {
                break
            }

            // If we're just at the EOF, break
            if err != nil {
                break
            }
        }

        if err == io.EOF {
            break
        }

        line := buffer.String()

        fmt.Printf(" > Read %d characters\n", len(line))

        // Process the line here.
        fmt.Println(" > > " + limitLength(line, 50))
    }

    if err != io.EOF {
        fmt.Printf(" > Failed!: %v\n", err)
    }

    return
}

func main() {
    testLongLines()
    testLinesThatDoNotFinishWithALinebreak()
}

func testLongLines() {
    fmt.Println("Long lines")
    fmt.Println()

    createFileWithLongLine("longline.txt")
    readFileWithReadString("longline.txt")
    fmt.Println()
    readFileWithScanner("longline.txt")
    fmt.Println()
    readFileWithReadLine("longline.txt")
    fmt.Println()
}

func testLinesThatDoNotFinishWithALinebreak() {
    fmt.Println("No linebreak")
    fmt.Println()

    createFileThatDoesNotEndWithALineBreak("nolinebreak.txt")
    readFileWithReadString("nolinebreak.txt")
    fmt.Println()
    readFileWithScanner("nolinebreak.txt")
    fmt.Println()
    readFileWithReadLine("nolinebreak.txt")
    fmt.Println()
}

func createFileThatDoesNotEndWithALineBreak(fn string) (err error) {
    file, err := os.Create(fn)
    defer file.Close()

    if err != nil {
        return err
    }

    w := bufio.NewWriter(file)
    w.WriteString("Does not end with linebreak.")
    w.Flush()

    return
}

func createFileWithLongLine(fn string) (err error) {
    file, err := os.Create(fn)
    defer file.Close()

    if err != nil {
        return err
    }

    w := bufio.NewWriter(file)

    fs := 1024 * 1024 * 4 // 4MB

    // Create a 4MB long line consisting of the letter a.
    for i := 0; i < fs; i++ {
        w.WriteRune('a')
    }

    // Terminate the line with a break.
    w.WriteRune('\n')

    // Put in a second line, which doesn't have a linebreak.
    w.WriteString("Second line.")

    w.Flush()

    return
}

func limitLength(s string, length int) string {
    if len(s) < length {
        return s
    }

    return s[:length]
}

I tested on:

我测试了:

  • go version go1.7 windows/amd64
  • go version go1.6.3 linux/amd64
  • go version go1.7.4 darwin/amd64
  • 转到版本 go1.7 windows/amd64
  • 转到版本 go1.6.3 linux/amd64
  • 转到版本 go1.7.4 达尔文/amd64

The test program outputs:

测试程序输出:

Long lines

readFileWithReadString
 > Read 4194305 characters
 > > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 > Read 12 characters
 > > Second line.

readFileWithScanner - this will fail!
 > Failed!: bufio.Scanner: token too long

readFileWithReadLine
 > Read 4194304 characters
 > > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 > Read 12 characters
 > > Second line.

No linebreak

readFileWithReadString
 > Read 28 characters
 > > Does not end with linebreak.

readFileWithScanner - this will fail!
 > Read 28 characters
 > > Does not end with linebreak.

readFileWithReadLine
 > Read 28 characters
 > > Does not end with linebreak.

回答by Malcolm

EDIT: As of go1.1, the idiomatic solution is to use bufio.Scanner

编辑:从 go1.1 开始,惯用的解决方案是使用bufio.Scanner

I wrote up a way to easily read each line from a file. The Readln(*bufio.Reader) function returns a line (sans \n) from the underlying bufio.Reader struct.

我写了一种方法来轻松读取文件中的每一行。Readln(*bufio.Reader) 函数从底层 bufio.Reader 结构返回一行(无\n)。

// Readln returns a single line (without the ending \n)
// from the input buffered reader.
// An error is returned iff there is an error with the
// buffered reader.
func Readln(r *bufio.Reader) (string, error) {
  var (isPrefix bool = true
       err error = nil
       line, ln []byte
      )
  for isPrefix && err == nil {
      line, isPrefix, err = r.ReadLine()
      ln = append(ln, line...)
  }
  return string(ln),err
}

You can use Readln to read every line from a file. The following code reads every line in a file and outputs each line to stdout.

您可以使用 Readln 从文件中读取每一行。以下代码读取文件中的每一行并将每一行输出到标准输出。

f, err := os.Open(fi)
if err != nil {
    fmt.Printf("error opening file: %v\n",err)
    os.Exit(1)
}
r := bufio.NewReader(f)
s, e := Readln(r)
for e == nil {
    fmt.Println(s)
    s,e = Readln(r)
}

Cheers!

干杯!

回答by zouying

There two common way to read file line by line.

有两种常用的逐行读取文件的方法。

  1. Use bufio.Scanner
  2. Use ReadString/ReadBytes/... in bufio.Reader
  1. 使用 bufio.Scanner
  2. 在 bufio.Reader 中使用 ReadString/ReadBytes/...

In my testcase, ~250MB, ~2,500,000 lines, bufio.Scanner(time used: 0.395491384s) is faster than bufio.Reader.ReadString(time_used: 0.446867622s).

在我的测试用例中,~250MB,~2,500,000 行,bufio.Scanner(time used: 0.395491384s) 比 bufio.Reader.ReadString(time_used: 0.446867622s) 快。

Source code: https://github.com/xpzouying/go-practice/tree/master/read_file_line_by_line

源代码:https: //github.com/xpzouying/go-practice/tree/master/read_file_line_by_line

Read file use bufio.Scanner,

使用 bufio.Scanner 读取文件,

func scanFile() {
    f, err := os.OpenFile(logfile, os.O_RDONLY, os.ModePerm)
    if err != nil {
        log.Fatalf("open file error: %v", err)
        return
    }
    defer f.Close()

    sc := bufio.NewScanner(f)
    for sc.Scan() {
        _ = sc.Text()  // GET the line string
    }
    if err := sc.Err(); err != nil {
        log.Fatalf("scan file error: %v", err)
        return
    }
}

Read file use bufio.Reader,

使用 bufio.Reader 读取文件,

func readFileLines() {
    f, err := os.OpenFile(logfile, os.O_RDONLY, os.ModePerm)
    if err != nil {
        log.Fatalf("open file error: %v", err)
        return
    }
    defer f.Close()

    rd := bufio.NewReader(f)
    for {
        line, err := rd.ReadString('\n')
        if err != nil {
            if err == io.EOF {
                break
            }

            log.Fatalf("read file line error: %v", err)
            return
        }
        _ = line  // GET the line string
    }
}

回答by Kokizzu

Example from this gist

这个要点的例子

func readLine(path string) {
  inFile, err := os.Open(path)
  if err != nil {
     fmt.Println(err.Error() + `: ` + path)
     return
  }
  defer inFile.Close()

  scanner := bufio.NewScanner(inFile)
  for scanner.Scan() {
    fmt.Println(scanner.Text()) // the line
  }
}

but this gives an error when there is a line that larger than Scanner's buffer.

但是当有一行大于扫描仪的缓冲区时,这会产生错误。

When that happened, what I do is use reader := bufio.NewReader(inFile)create and concat my own buffer either using ch, err := reader.ReadByte()or len, err := reader.Read(myBuffer)

如果发生这种情况,我做的是利用reader := bufio.NewReader(inFile)创建和CONCAT我自己的缓存或者使用ch, err := reader.ReadByte()len, err := reader.Read(myBuffer)

回答by lzap

You can also use ReadString with \n as a separator:

您还可以使用 ReadString 和 \n 作为分隔符:

  f, err := os.Open(filename)
  if err != nil {
    fmt.Println("error opening file ", err)
    os.Exit(1)
  }
  defer f.Close()
  r := bufio.NewReader(f)
  for {
    path, err := r.ReadString(10) // 0x0A separator = newline
    if err == io.EOF {
      // do something here
      break
    } else if err != nil {
      return err // if you return error
    }
  }

回答by kroisse

bufio.Reader.ReadLine()works well. But if you want to read each line by a string, try to use ReadString('\n'). It doesn't need to reinvent the wheel.

bufio.Reader.ReadLine()运行良好。但是,如果您想通过字符串读取每一行,请尝试使用ReadString('\n')。它不需要重新发明轮子。

回答by cyber

// strip '\n' or read until EOF, return error if read error  
func readline(reader io.Reader) (line []byte, err error) {   
    line = make([]byte, 0, 100)                              
    for {                                                    
        b := make([]byte, 1)                                 
        n, er := reader.Read(b)                              
        if n > 0 {                                           
            c := b[0]                                        
            if c == '\n' { // end of line                    
                break                                        
            }                                                
            line = append(line, c)                           
        }                                                    
        if er != nil {                                       
            err = er                                         
            return                                           
        }                                                    
    }                                                        
    return                                                   
}                                    

回答by zuzuleinen

In the code bellow, I read the interests from the CLI until the user hits enter and I'm using Readline:

在下面的代码中,我从 CLI 读取兴趣,直到用户点击 Enter 并且我正在使用 Readline:

interests := make([]string, 1)
r := bufio.NewReader(os.Stdin)
for true {
    fmt.Print("Give me an interest:")
    t, _, _ := r.ReadLine()
    interests = append(interests, string(t))
    if len(t) == 0 {
        break;
    }
}
fmt.Println(interests)