pandas 使用 Panda read_csv 列出超出范围的索引

Question

提问by Nero Ouali

I'm trying to read large data (thousands of rows) through a python script from csv files which look like this:

我正在尝试通过 python 脚本从 csv 文件读取大数据（数千行），如下所示：

.....
2015-11-03 20:16:28,000;63,62;
2015-11-03 20:16:29,000;63,75;
2015-11-03 20:16:30,000;63,86;
2015-11-03 20:16:31,000;64,25;

but it appears that one of the files has extra empty rows that have 196541465 blank spaces — then the code crashes when reading it with read_csv of pandas lib.

但似乎其中一个文件有额外的空行，其中有 196541465 个空格 - 然后在使用 pandas lib 的 read_csv 读取它时代码崩溃。

     File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 4221, in append
        elif isinstance(other, list) and not isinstance(other[0], DataFrame):
IndexError: list index out of range

I'm using the folowing command:

我正在使用以下命令：

data = pd.read_csv(input_file,skiprows = [0],usecols=[0,1,2],delimiter=';',decimal=',', names = [ 'date','angle','Unnamed'],na_filter = False,parse_dates = [0],date_parser = reformat_date,error_bad_lines = False,skip_blank_lines=True)#,nrows = 8191)

the culprit row is the 8192'th, when limiting rows (by rows = 8191) it works just fine. I've tried many options from the doc but it doesn't seem to work! Any idea?

罪魁祸首是第 8192 行，当限制行（by rows = 8191）时，它工作得很好。我已经尝试了文档中的许多选项，但似乎不起作用！任何的想法？

Answer 1

回答by rogueleaderr

I got this error because I was trying to read a CSV file that had too few headers vs. the number of columns (e.g. 10 columns, but only 8 headers. If you set index_col=False, pandas doesn't know what to do with the extra columns)

我收到这个错误是因为我试图读取一个标题与列数相比太少的 CSV 文件（例如 10 列，但只有 8 个标题。如果你设置了index_col=False，pandas 不知道如何处理额外的列） )

Answer 2

回答by Marcus H?gen? Bohman

Edited according to Mitjas comment below.

根据下面的 Mitjas 评论进行编辑。

I just had the same issue and index_col = Falsedidn't work. I had 19 columns and only 17 headers. Solved it with reading columns and headers separately and then adding the header names.

我只是遇到了同样的问题，index_col = False但没有奏效。我有 19 列，只有 17 个标题。通过分别读取列和标题然后添加标题名称来解决它。

dfcolumns = pd.read_csv('file.csv',
                        nrows = 1)
df = pd.read_csv('file.csv',
                  header = None,
                  skiprows = 1,
                  usecols = list(range(len(dfcolumns.columns))),
                  names = dfcolumns.columns)

pandas 使用 Panda read_csv 列出超出范围的索引

提问by Nero Ouali

回答by rogueleaderr

回答by Marcus H?gen? Bohman

相关推荐

最近更新

标签

pandas 使用 Panda read_csv 列出超出范围的索引

提问by Nero Ouali

回答by rogueleaderr

回答by Marcus H?gen? Bohman

相关推荐

pandas 熊猫时间从 UTC 到本地

pandas python if语句字典与系列不兼容的索引器

ValueError：在 Pandas 数据帧上使用 itertuples() 时解包的值太多

pandas 如何计算数据帧pandas-python中值的条件概率？

相关推荐

最近更新

标签