Python 从第 2 行读取文件或跳过标题行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4796764/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Read file from line 2 or skip header row
提问by super9
How can I skip the header row and start reading a file from line2?
如何跳过标题行并从第 2 行开始读取文件?
采纳答案by SilentGhost
with open(fname) as f:
next(f)
for line in f:
#do something
回答by chriscauley
f = open(fname,'r')
lines = f.readlines()[1:]
f.close()
回答by Dror Hilman
f = open(fname).readlines()
firstLine = f.pop(0) #removes the first line
for line in f:
...
回答by Vajk Hermecz
If slicing could work on iterators...
如果切片可以在迭代器上工作......
from itertools import islice
with open(fname) as f:
for line in islice(f, 1, None):
pass
回答by saimadhu.polamuri
If you want the first line and then you want to perform some operation on file this code will helpful.
如果你想要第一行,然后你想对文件执行一些操作,这段代码会很有帮助。
with open(filename , 'r') as f:
first_line = f.readline()
for line in f:
# Perform some operations
回答by Mauro Rementeria
# Open a connection to the file
with open('world_dev_ind.csv') as file:
# Skip the column names
file.readline()
# Initialize an empty dictionary: counts_dict
counts_dict = {}
# Process only the first 1000 rows
for j in range(0, 1000):
# Split the current line into a list: line
line = file.readline().split(',')
# Get the value for the first column: first_col
first_col = line[0]
# If the column value is in the dict, increment its value
if first_col in counts_dict.keys():
counts_dict[first_col] += 1
# Else, add to the dict and set value to 1
else:
counts_dict[first_col] = 1
# Print the resulting dictionary
print(counts_dict)
回答by Minh Tran
To generalize the task of reading multiple header lines and to improve readability I'd use method extraction. Suppose you wanted to tokenize the first three lines of coordinates.txtto use as header information.
为了概括读取多个标题行的任务并提高可读性,我将使用方法提取。假设您想对前三行进行标记coordinates.txt以用作标题信息。
Example
例子
coordinates.txt
---------------
Name,Longitude,Latitude,Elevation, Comments
String, Decimal Deg., Decimal Deg., Meters, String
Euler's Town,7.58857,47.559537,0, "Blah"
Faneuil Hall,-71.054773,42.360217,0
Yellowstone National Park,-110.588455,44.427963,0
Then method extraction allows you to specify whatyou want to do with the header information (in this example we simply tokenize the header lines based on the comma and return it as a list but there's room to do much more).
然后提取方法允许你指定什么,你想用头信息做(在这个例子中,我们简单的记号化基础上,逗号标题行并返回一个列表,但有足够的空间做更多的工作)。
def __readheader(filehandle, numberheaderlines=1):
"""Reads the specified number of lines and returns the comma-delimited
strings on each line as a list"""
for _ in range(numberheaderlines):
yield map(str.strip, filehandle.readline().strip().split(','))
with open('coordinates.txt', 'r') as rh:
# Single header line
#print next(__readheader(rh))
# Multiple header lines
for headerline in __readheader(rh, numberheaderlines=2):
print headerline # Or do other stuff with headerline tokens
Output
输出
['Name', 'Longitude', 'Latitude', 'Elevation', 'Comments']
['String', 'Decimal Deg.', 'Decimal Deg.', 'Meters', 'String']
If coordinates.txtcontains another headerline, simply change numberheaderlines. Best of all, it's clear what __readheader(rh, numberheaderlines=2)is doing and we avoid the ambiguity of having to figure out or comment on why author of the the accepted answer uses next()in his code.
如果coordinates.txt包含另一个标题,只需更改numberheaderlines. 最重要的是,很清楚__readheader(rh, numberheaderlines=2)正在做什么,我们避免了必须弄清楚或评论为什么被接受的答案的作者next()在他的代码中使用的歧义。
回答by u5675325
If you want to read multiple CSV files starting from line 2, this works like a charm
如果你想从第 2 行开始读取多个 CSV 文件,这就像一个魅力
for files in csv_file_list:
with open(files, 'r') as r:
next(r) #skip headers
rr = csv.reader(r)
for row in rr:
#do something
(this is part of Parfait's answerto a different question)

