Python 在csv导入熊猫期间跳过行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20637439/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Skip rows during csv import pandas
提问by thosphor
I'm trying to import a .csv file using pandas.read_csv(), however I don't want to import the 2nd row of the data file (the row with index = 1 for 0-indexing).
我正在尝试使用 导入 .csv 文件pandas.read_csv(),但是我不想导入数据文件的第二行(对于 0 索引,索引 = 1 的行)。
I can't see how not to import it because the arguments used with the command seem ambiguous:
我看不出如何不导入它,因为与命令一起使用的参数似乎不明确:
From the pandas website:
从熊猫网站:
skiprows: list-like or integerRow numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file."
skiprows: 类似列表或整数文件开头要跳过的行数(0 索引)或要跳过的行数(整数)。”
If I put skiprows=1in the arguments, how does it know whether to skip the first row or skip the row with index 1?
如果我skiprows=1输入参数,它如何知道是跳过第一行还是跳过索引为 1 的行?
采纳答案by alko
You can try yourself:
你可以自己试试:
>>> import pandas as pd
>>> from StringIO import StringIO
>>> s = """1, 2
... 3, 4
... 5, 6"""
>>> pd.read_csv(StringIO(s), skiprows=[1], header=None)
0 1
0 1 2
1 5 6
>>> pd.read_csv(StringIO(s), skiprows=1, header=None)
0 1
0 3 4
1 5 6
回答by Hugo
回答by Justin R. Locke
Also be sure that your file is actually a CSV file.For example, if you had an .xls file, and simply changed the file extension to .csv, the file won't import and will give the error above. To check to see if this is your problem open the file in excel and it will likely say:
还要确保您的文件实际上是一个 CSV 文件。例如,如果您有一个 .xls 文件,并且只是将文件扩展名更改为 .csv,则该文件不会导入并会出现上述错误。要检查这是否是您的问题,请在 excel 中打开文件,它可能会说:
"The file format and extension of 'Filename.csv' don't match. The file could be corrupted or unsafe. Unless you trust its source, don't open it. Do you want to open it anyway?"
“'Filename.csv' 的文件格式和扩展名不匹配。该文件可能已损坏或不安全。除非您信任其来源,否则请勿打开它。您还是要打开它吗?”
To fix the file: open the file in Excel, click "Save As", Choose the file format to save as (use .cvs), then replace the existing file.
修复文件:在 Excel 中打开文件,单击“另存为”,选择要另存为的文件格式(使用 .cvs),然后替换现有文件。
This was my problem, and fixed the error for me.
这是我的问题,并为我修复了错误。
回答by Viraj Wadate
I got the same issue while running the skiprows while reading the csv file. I was doning skip_rows=1 this will not work
我在读取 csv 文件时运行 skiprows 时遇到了同样的问题。我正在做 skip_rows=1 这行不通
Simple example gives an idea how to use skiprows while reading csv file.
简单示例给出了如何在读取 csv 文件时使用跳过行的想法。
import pandas as pd
#skiprows=1 will skip first line and try to read from second line
df = pandas.read_csv('my_csv_file.csv', skiprows=1)
#print the data frame
df
回答by shanky
skip[1]will skip second line, not the first one.
skip[1]将跳过第二行,而不是第一行。
回答by EBo
All of these answers miss one important point -- the n'th line is the n'th line in the file, and not the n'th row in the dataset. I have a situation where I download some antiquated stream gauge data from the USGS. The head of the dataset is commented with '#', the first line after that are the labels, next comes a line that describes the date types, and last the data itself. I never know how many comment lines there are, but I know what the first couple of rows are. Example:
所有这些答案都忽略了一个重要的点——第 n 行是文件中的第 n 行,而不是数据集中的第 n 行。我有一种情况,我从 USGS 下载了一些过时的流量测量仪数据。数据集的头部用“#”注释,之后的第一行是标签,接下来是描述日期类型的行,最后是数据本身。我永远不知道有多少注释行,但我知道前几行是什么。例子:
----------------------------- WARNING ----------------------------------
Some of the data that you have obtained from this U.S. Geological Survey database
may not have received Director's approval. ... agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd
5s 15s 20d 6s 14n 10s USGS 08041780 2018-05-06 00:00 CDT 1.98 A
- - - - - - - - - - - - - - - 警告 - - - - - - - - - - --------------
您从这个美国地质调查局数据库中获得的一些数据
可能没有得到董事的批准。... Agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd
5s 15s 20d 6s 14n 10s USGS 08041780 2018-05-06 00:00 CDT 1.98 A
It would be nice if there was a way to automatically skip the n'th row as well as the n'th line.
如果有一种方法可以自动跳过第 n 行和第 n 行,那就太好了。
As a note, I was able to fix my issue with:
作为说明,我能够通过以下方式解决我的问题:
import pandas as pd
ds = pd.read_csv(fname, comment='#', sep='\t', header=0, parse_dates=True)
ds.drop(0, inplace=True)

