计算 CSV Python 中有多少行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16108526/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:47:09  来源:igfitidea点击:

Count how many lines are in a CSV Python?

pythoncsvcount

提问by GrantU

I'm using python (Django Framework) to read a CSV file. I pull just 2 lines out of this CSV as you can see. What I have been trying to do is store in a variable the total number of rows the CSV also.

我正在使用 python (Django Framework) 来读取 CSV 文件。如您所见,我只从这个 CSV 文件中提取了 2 行。我一直在尝试做的是将 CSV 的总行数也存储在一个变量中。

How can I get the total number of rows?

如何获得总行数?

file = object.myfilePath
fileObject = csv.reader(file)
for i in range(2):
    data.append(fileObject.next()) 

I have tried:

我试过了:

len(fileObject)
fileObject.length

采纳答案by Martijn Pieters

You need to count the number of rows:

您需要计算行数:

row_count = sum(1 for row in fileObject)  # fileObject is your csv.reader

Using sum()with a generator expression makes for an efficient counter, avoiding storing the whole file in memory.

使用sum()与发电机表达使一个有效的计数器,从而避免在存储器中存储整个文件。

If you already read 2 rows to start with, then you need to add those 2 rows to your total; rows that have already been read are not being counted.

如果您已经开始阅读 2 行,那么您需要将这 2 行添加到您的总数中;已读取的行不计算在内。

回答by Alex Troush

numline = len(file_read.readlines())

回答by sam collins

To do it you need to have a bit of code like my example here:

要做到这一点,你需要有一些像我这里的例子一样的代码:

file = open("Task1.csv")
numline = len(file.readlines())
print (numline)

I hope this helps everyone.

我希望这对大家有帮助。

回答by kevin

might want to try something as simple as below in the command line:

可能想在命令行中尝试如下简单的操作:

sed -n '$=' filename or wc -l filename

sed -n '$=' filename 或者 wc -l filename

回答by akshaynagpal

import csv
count = 0
with open('filename.csv', 'rb') as count_file:
    csv_reader = csv.reader(count_file)
    for row in csv_reader:
        count += 1

print count

回答by dixhom

2018-10-29 EDIT

2018-10-29 编辑

Thank you for the comments.

谢谢你的意见。

I tested several kinds of code to get the number of lines in a csv file in terms of speed. The best method is below.

我测试了几种代码,以从速度方面获取 csv 文件中的行数。最好的方法如下。

with open(filename) as f:
    sum(1 for line in f)

Here is the code tested.

这是测试的代码。

import timeit
import csv
import pandas as pd

filename = './sample_submission.csv'

def talktime(filename, funcname, func):
    print(f"# {funcname}")
    t = timeit.timeit(f'{funcname}("{filename}")', setup=f'from __main__ import {funcname}', number = 100) / 100
    print('Elapsed time : ', t)
    print('n = ', func(filename))
    print('\n')

def sum1forline(filename):
    with open(filename) as f:
        return sum(1 for line in f)
talktime(filename, 'sum1forline', sum1forline)

def lenopenreadlines(filename):
    with open(filename) as f:
        return len(f.readlines())
talktime(filename, 'lenopenreadlines', lenopenreadlines)

def lenpd(filename):
    return len(pd.read_csv(filename)) + 1
talktime(filename, 'lenpd', lenpd)

def csvreaderfor(filename):
    cnt = 0
    with open(filename) as f:
        cr = csv.reader(f)
        for row in cr:
            cnt += 1
    return cnt
talktime(filename, 'csvreaderfor', csvreaderfor)

def openenum(filename):
    cnt = 0
    with open(filename) as f:
        for i, line in enumerate(f,1):
            cnt += 1
    return cnt
talktime(filename, 'openenum', openenum)

The result was below.

结果如下。

# sum1forline
Elapsed time :  0.6327946722068599
n =  2528244


# lenopenreadlines
Elapsed time :  0.655304473598555
n =  2528244


# lenpd
Elapsed time :  0.7561274056295324
n =  2528244


# csvreaderfor
Elapsed time :  1.5571560935772661
n =  2528244


# openenum
Elapsed time :  0.773000013928679
n =  2528244

In conclusion, sum(1 for line in f)is fastest. But there might not be significant difference from len(f.readlines()).

总之,sum(1 for line in f)是最快的。但与len(f.readlines()).

sample_submission.csvis 30.2MB and has 31 million characters.

sample_submission.csv是 30.2MB,有 3100 万个字符。

回答by Old Bald Guy

Several of the above suggestions count the number of LINES in the csv file. But some CSV files will contain quoted strings which themselves contain newline characters. MS CSV files usually delimit records with \r\n, but use \n alone within quoted strings.

上面的一些建议是计算 csv 文件中的 LINES 数。但是一些 CSV 文件将包含带引号的字符串,这些字符串本身包含换行符。MS CSV 文件通常用 \r\n 分隔记录,但在带引号的字符串中单独使用 \n。

For a file like this, counting lines of text (as delimited by newline) in the file will give too large a result. So for an accurate count you need to use csv.reader to read the records.

对于这样的文件,计算文件中的文本行数(由换行符分隔)会产生太大的结果。因此,为了准确计数,您需要使用 csv.reader 来读取记录。

回答by Amir

This works for csv and all files containing strings in Unix-based OSes:

这适用于 csv 和所有包含基于 Unix 的操作系统中的字符串的文件:

import os

numOfLines = int(os.popen('wc -l < file.csv').read()[:-1])

In case the csv file contains a fields row you can deduct one from numOfLinesabove:

如果 csv 文件包含一个字段行,您可以从numOfLines上面扣除一个:

numOfLines = numOfLines - 1

回答by Sean

Use "list" to fit a more workably object.

使用“列表”来适应更可行的对象。

You can then count, skip, mutate till your heart's desire:

然后你可以数数、跳过、变异直到你内心的渴望:

list(fileObject) #list values

len(list(fileObject)) # get length of file lines

list(fileObject)[10:] # skip first 10 lines

回答by protti

First you have to open the file with open

首先,您必须使用 open 打开文件

input_file = open("nameOfFile.csv","r+")

Then use the csv.reader for open the csv

然后使用 csv.reader 打开 csv

reader_file = csv.reader(input_file)

At the last, you can take the number of row with the instruction 'len'

最后,您可以使用指令“len”获取行数

value = len(list(reader_file))

The total code is this:

总代码是这样的:

input_file = open("nameOfFile.csv","r+")
reader_file = csv.reader(input_file)
value = len(list(reader_file))

Remember that if you want to reuse the csv file, you have to make a input_file.fseek(0), because when you use a list for the reader_file, it reads all file, and the pointer in the file change its position

记住,如果你想重用csv文件,你必须做一个input_file.fseek(0),因为当你使用reader_file的列表时,它读取所有文件,文件中的指针改变了它的位置