Python 在csv导入熊猫期间跳过行

Question

提问by thosphor

I'm trying to import a .csv file using pandas.read_csv(), however I don't want to import the 2nd row of the data file (the row with index = 1 for 0-indexing).

我正在尝试使用导入 .csv 文件pandas.read_csv()，但是我不想导入数据文件的第二行（对于 0 索引，索引 = 1 的行）。

I can't see how not to import it because the arguments used with the command seem ambiguous:

我看不出如何不导入它，因为与命令一起使用的参数似乎不明确：

From the pandas website:

从熊猫网站：

skiprows: list-like or integer
Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file."

skiprows: 类似列表或整数
文件开头要跳过的行数（0 索引）或要跳过的行数（整数）。”

If I put skiprows=1in the arguments, how does it know whether to skip the first row or skip the row with index 1?

如果我skiprows=1输入参数，它如何知道是跳过第一行还是跳过索引为 1 的行？

Answer 1

采纳答案by alko

You can try yourself:

你可以自己试试：

>>> import pandas as pd
>>> from StringIO import StringIO
>>> s = """1, 2
... 3, 4
... 5, 6"""
>>> pd.read_csv(StringIO(s), skiprows=[1], header=None)
   0  1
0  1  2
1  5  6
>>> pd.read_csv(StringIO(s), skiprows=1, header=None)
   0  1
0  3  4
1  5  6

Answer 2

回答by Hugo

I don't have reputation to comment yet, but I want to add to alkoanswer for further reference.

我还没有评论的声誉，但我想添加到alko答案以供进一步参考。

From the docs:

从文档：

skiprows: A collection of numbers for rows in the file to skip. Can also be an integer to skip the first n rows

skiprows：文件中要跳过的行的数字集合。也可以是整数以跳过前 n 行

Answer 3

回答by Justin R. Locke

Also be sure that your file is actually a CSV file.For example, if you had an .xls file, and simply changed the file extension to .csv, the file won't import and will give the error above. To check to see if this is your problem open the file in excel and it will likely say:

还要确保您的文件实际上是一个 CSV 文件。例如，如果您有一个 .xls 文件，并且只是将文件扩展名更改为 .csv，则该文件不会导入并会出现上述错误。要检查这是否是您的问题，请在 excel 中打开文件，它可能会说：

"The file format and extension of 'Filename.csv' don't match. The file could be corrupted or unsafe. Unless you trust its source, don't open it. Do you want to open it anyway?"

“'Filename.csv' 的文件格式和扩展名不匹配。该文件可能已损坏或不安全。除非您信任其来源，否则请勿打开它。您还是要打开它吗？”

To fix the file: open the file in Excel, click "Save As", Choose the file format to save as (use .cvs), then replace the existing file.

修复文件：在 Excel 中打开文件，单击“另存为”，选择要另存为的文件格式（使用 .cvs），然后替换现有文件。

This was my problem, and fixed the error for me.

这是我的问题，并为我修复了错误。

Answer 4

回答by Viraj Wadate

I got the same issue while running the skiprows while reading the csv file. I was doning skip_rows=1 this will not work

我在读取 csv 文件时运行 skiprows 时遇到了同样的问题。我正在做 skip_rows=1 这行不通

Simple example gives an idea how to use skiprows while reading csv file.

简单示例给出了如何在读取 csv 文件时使用跳过行的想法。

import pandas as pd

#skiprows=1 will skip first line and try to read from second line
df = pandas.read_csv('my_csv_file.csv', skiprows=1)

#print the data frame
df

Answer 5

回答by shanky

skip[1]will skip second line, not the first one.

skip[1]将跳过第二行，而不是第一行。

Answer 6

回答by EBo

All of these answers miss one important point -- the n'th line is the n'th line in the file, and not the n'th row in the dataset. I have a situation where I download some antiquated stream gauge data from the USGS. The head of the dataset is commented with '#', the first line after that are the labels, next comes a line that describes the date types, and last the data itself. I never know how many comment lines there are, but I know what the first couple of rows are. Example:

所有这些答案都忽略了一个重要的点——第 n 行是文件中的第 n 行，而不是数据集中的第 n 行。我有一种情况，我从 USGS 下载了一些过时的流量测量仪数据。数据集的头部用“#”注释，之后的第一行是标签，接下来是描述日期类型的行，最后是数据本身。我永远不知道有多少注释行，但我知道前几行是什么。例子：

----------------------------- WARNING ----------------------------------
Some of the data that you have obtained from this U.S. Geological Survey database
may not have received Director's approval. ... agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd
5s 15s 20d 6s 14n 10s USGS 08041780 2018-05-06 00:00 CDT 1.98 A

- - - - - - - - - - - - - - - 警告 - - - - - - - - - - --------------
您从这个美国地质调查局数据库中获得的一些数据
可能没有得到董事的批准。... Agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd
5s 15s 20d 6s 14n 10s USGS 08041780 2018-05-06 00:00 CDT 1.98 A

It would be nice if there was a way to automatically skip the n'th row as well as the n'th line.

如果有一种方法可以自动跳过第 n 行和第 n 行，那就太好了。

As a note, I was able to fix my issue with:

作为说明，我能够通过以下方式解决我的问题：

import pandas as pd
ds = pd.read_csv(fname, comment='#', sep='\t', header=0, parse_dates=True)
ds.drop(0, inplace=True)

Python 在csv导入熊猫期间跳过行

提问by thosphor

采纳答案by alko

回答by Hugo

回答by Justin R. Locke

回答by Viraj Wadate

回答by shanky

回答by EBo

----------------------------- WARNING ----------------------------------

Some of the data that you have obtained from this U.S. Geological Survey database

may not have received Director's approval. ... agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd

- - - - - - - - - - - - - - - 警告 - - - - - - - - - - --------------

您从这个美国地质调查局数据库中获得的一些数据

可能没有得到董事的批准。... Agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd

相关推荐

最近更新

标签

Python 在csv导入熊猫期间跳过行

提问by thosphor

采纳答案by alko

回答by Hugo

回答by Justin R. Locke

回答by Viraj Wadate

回答by shanky

回答by EBo

----------------------------- WARNING ----------------------------------

Some of the data that you have obtained from this U.S. Geological Survey database

may not have received Director's approval. ... agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd

- - - - - - - - - - - - - - - 警告 - - - - - - - - - - --------------

您从这个美国地质调查局数据库中获得的一些数据

可能没有得到董事的批准。... Agency_cd site_no datetime tz_cd 139719_00065 139719_00065_cd

相关推荐

为 HTML 网页运行 python 脚本

Python 从列表中打印特定项目

Python 如何快速估计两个（纬度、经度）点之间的距离？

Python 如何找到已安装的熊猫版本

相关推荐

最近更新

标签