如何使用python计算文本文件中的总行数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19001402/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:32:15  来源:igfitidea点击:

how to count the total number of lines in a text file using python

pythonfilefile-iosum

提问by

For example if my text file is:

例如,如果我的文本文件是:

blue
green
yellow
black

Here there are four lines and now I want to get the result as four. How can I do that?

这里有四行,现在我想得到四行的结果。我怎样才能做到这一点?

采纳答案by alecxe

You can use sum()with a generator expression:

您可以使用sum()生成器表达式:

with open('data.txt') as f:
    print sum(1 for _ in f)

Note that you cannot use len(f), since fis an iterator. _is a special variable name for throwaway variables, see What is the purpose of the single underscore "_" variable in Python?.

请注意,您不能使用len(f),因为f是一个迭代器_是一次性变量的特殊变量名,请参阅Python 中单个下划线“_”变量的用途是什么?.

You can use len(f.readlines()), but this will create an additional list in memory, which won't even work on huge files that don't fit in memory.

您可以使用len(f.readlines()),但这会在内存中创建一个额外的列表,它甚至不适用于不适合内存的大文件。

回答by Koustav Ghosal

count=0
with open ('filename.txt','rb') as f:
    for line in f:
        count+=1

print count

回答by TerryA

You can use sum()with a generator expression here. The generator expression will be [1, 1, ...]up to the length of the file. Then we call sum()to add them all together, to get the total count.

您可以sum()在此处与生成器表达式一起使用。生成器表达式将[1, 1, ...]达到文件的长度。然后我们打电话sum()把它们加在一起,得到总数。

with open('text.txt') as myfile:
    count = sum(1 for line in myfile)

It seems by what you have tried that you don't want to include empty lines. You can then do:

根据您的尝试,您似乎不想包含空行。然后你可以这样做:

with open('text.txt') as myfile:
    count = sum(1 for line in myfile if line.rstrip('\n'))

回答by Michael Bacon

This link (How to get line count cheaply in Python?) has lots of potential solutions, but they all ignore one way to make this run considerably faster, namely by using the unbuffered (raw) interface, using bytearrays, and doing your own buffering.

这个链接(如何在 Python 中便宜地获得行数?)有很多潜在的解决方案,但它们都忽略了一种使运行速度大大加快的方法,即使用无缓冲(原始)接口、使用字节数组和做你自己的缓冲.

Using a modified version of the timing tool, I believe the following code is faster (and marginally more pythonic) than any of the solutions offered:

使用计时工具的修改版本,我相信以下代码比提供的任何解决方案都更快(并且稍微更 Python 化):

def _make_gen(reader):
    b = reader(1024 * 1024)
    while b:
        yield b
        b = reader(1024*1024)

def rawpycount(filename):
    f = open(filename, 'rb')
    f_gen = _make_gen(f.raw.read)
    return sum( buf.count(b'\n') for buf in f_gen )

Here are my timings:

这是我的时间安排:

rawpycount        0.0048  0.0046   1.00
bufcount          0.0074  0.0066   1.43
wccount             0.01    0.01   2.17
itercount          0.014   0.014   3.04
opcount            0.021    0.02   4.43
kylecount          0.023   0.021   4.58
simplecount        0.022   0.022   4.81
mapcount           0.038   0.032   6.82

I would post it there, but I'm a relatively new user to stack exchange and don't have the requisite manna.

我会把它贴在那里,但我是一个相对较新的堆栈交换用户,没有必要的甘露。

EDIT:

编辑:

This can be done completely with generators expressions in-line using itertools, but it gets pretty weird looking:

这可以完全通过使用 itertools 的生成器表达式来完成,但它看起来很奇怪:

from itertools import (takewhile,repeat)

def rawbigcount(filename):
    f = open(filename, 'rb')
    bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024) for _ in repeat(None)))
    return sum( buf.count(b'\n') for buf in bufgen if buf )

回答by Naveen

this one also gives the no.of lines in a file.

这也给出了文件中的行数。

a=open('filename.txt','r')
l=a.read()
count=l.splitlines()
print(len(count))

回答by Surya

One liner:

一个班轮:

total_line_count = sum(1 for line in open("filename.txt"))

print(total_line_count)

回答by Tiebe Groosman

Use:

用:

num_lines = sum(1 for line in open('data.txt'))
print(num_lines)

That will work.

那可行。

回答by Michell

For the people saying to use with open ("filename.txt","r") as fyou can do anyname = open("filename.txt","r")

对于说使用 with open ("filename.txt","r") as f你可以做的人anyname = open("filename.txt","r")

def main():

    file = open("infile.txt",'r')
    count = 0
    for line in file:
            count+=1

    print (count)

main ()

回答by Amaan

here is how you can do it through list comprehension, but this will waste a little bit of your computer's memory as line.strip() has been called twice.

这是您可以通过列表理解来完成的方法,但这会浪费一点计算机内存,因为 line.strip() 已被调用两次。

     with open('textfile.txt') as file:
lines =[
            line.strip()
            for line in file
             if line.strip() != '']
print("number of lines =  {}".format(len(lines)))

回答by Steven Johnson

I am not new to stackoverflow, just never had an account and usually came here for answers. I can't comment or vote up an answer yet. BUT wanted to say that the code from Michael Bacon above works really well. I am new to Python but not to programming. I have been reading Python Crash Course and there are a few things I wanted to do to break up the reading cover to cover approach. One utility that has uses from an ETL or even data quality perspective would be to capture the row count of a file independently from any ETL. The file has X number of rows, you import into SQL or Hadoop and you end up with X number of rows. You can validate at the lowest level the row count of a raw data file.

我对 stackoverflow 并不陌生,只是从未有过帐户并且通常是来这里寻求答案的。我还不能评论或投票给答案。但我想说上面 Michael Bacon 的代码非常有效。我是 Python 新手,但不是编程新手。我一直在阅读 Python 速成课程,我想做一些事情来打破从头到尾的阅读方法。从 ETL 甚至数据质量角度使用的一种实用程序是独立于任何 ETL 捕获文件的行数。该文件有 X 行,您导入到 SQL 或 Hadoop 中,最终得到 X 行。您可以在最低级别验证原始数据文件的行数。

I have been playing with his code and doing some testing and this code is very efficient so far. I have created several different CSV files, various sizes, and row counts. You can see my code below and my comments provide the times and details. The code Michael Bacon above provided runs about 6 times faster than the normal Python method of just looping the lines.

我一直在玩他的代码并进行一些测试,到目前为止,这段代码非常有效。我创建了几个不同的 CSV 文件、各种大小和行数。你可以在下面看到我的代码,我的评论提供了时间和细节。上面提供的代码 Michael Bacon 的运行速度比仅循环行的普通 Python 方法快 6 倍。

Hope this helps someone.

希望这可以帮助某人。



 import time
from itertools import (takewhile,repeat)

def readfilesimple(myfile):

    # watch me whip
    linecounter = 0
    with open(myfile,'r') as file_object:
        # watch me nae nae
         for lines in file_object:
            linecounter += 1

    return linecounter

def readfileadvanced(myfile):

    # watch me whip
    f = open(myfile, 'rb')
    # watch me nae nae
    bufgen = takewhile(lambda x: x, (f.raw.read(1024 * 1024) for _ in repeat(None)))
    return sum(buf.count(b'\n') for buf in bufgen if buf)
    #return linecounter


# ************************************
# Main
# ************************************

#start the clock

start_time = time.time()

# 6.7 seconds to read a 475MB file that has 24 million rows and 3 columns
#mycount = readfilesimple("c:/junk/book1.csv")

# 0.67 seconds to read a 475MB file that has 24 million rows and 3 columns
#mycount = readfileadvanced("c:/junk/book1.csv")

# 25.9 seconds to read a 3.9Gb file that has 3.25 million rows and 104 columns
#mycount = readfilesimple("c:/junk/WideCsvExample/ReallyWideReallyBig1.csv")

# 5.7 seconds to read a 3.9Gb file that has 3.25 million rows and 104 columns
#mycount = readfileadvanced("c:/junk/WideCsvExample/ReallyWideReallyBig1.csv")


# 292.92 seconds to read a 43Gb file that has 35.7 million rows and 104 columns
mycount = readfilesimple("c:/junk/WideCsvExample/ReallyWideReallyBig.csv")

# 57 seconds to read a 43Gb file that has 35.7 million rows and 104 columns
#mycount = readfileadvanced("c:/junk/WideCsvExample/ReallyWideReallyBig.csv")


#stop the clock
elapsed_time = time.time() - start_time


print("\nCode Execution: " + str(elapsed_time) + " seconds\n")
print("File contains: " + str(mycount) + " lines of text.")