Python Pandas read_csv 导入导致错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24293745/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:23:21  来源:igfitidea点击:

Pandas read_csv import results in error

pythoncsvpandas

提问by Sid Kwakkel

My csv is as follows (MQM Q.csv):

我的csv如下(MQM Q.csv):

Date-Time,Value,Grade,Approval,Interpolation Code 
31/08/2012 12:15:00,,41,1,1 
31/08/2012 12:30:00,,41,1,1 
31/08/2012 12:45:00,,41,1,1 
31/08/2012 13:00:00,,41,1,1 
31/08/2012 13:15:00,,41,1,1 
31/08/2012 13:30:00,,41,1,1 
31/08/2012 13:45:00,,41,1,1 
31/08/2012 14:00:00,,41,1,1 
31/08/2012 14:15:00,,41,1,1

The first few lines have no "Value" entries but they start later on.

前几行没有“值”条目,但它们稍后开始。

Here is my code:

这是我的代码:

import pandas as pd 
from StringIO import StringIO
Q = pd.read_csv(StringIO("""/cygdrive/c/temp/MQM Q.csv"""), header=0, usecols=["Date-Time", "Value"], parse_dates=True, dayfirst=True, index_col=0)

I get the following error:

我收到以下错误:

Traceback (most recent call last):
  File "daily.py", line 4, in <module>
    Q = pd.read_csv(StringIO("""/cygdrive/c/temp/MQM Q.csv"""), header=0, usecols=["Date-Time", "Value"], parse_dates=True, dayfirst=True, index_col=0)
  File "/usr/lib/python2.7/site-packages/pandas-0.14.0-py2.7-cygwin-1.7.30-x86_64.egg/pandas/io/parsers.py", line 443, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python2.7/site-packages/pandas-0.14.0-py2.7-cygwin-1.7.30-x86_64.egg/pandas/io/parsers.py", line 228, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/lib/python2.7/site-packages/pandas-0.14.0-py2.7-cygwin-1.7.30-x86_64.egg/pandas/io/parsers.py", line 533, in __init__
    self._make_engine(self.engine)
  File "/usr/lib/python2.7/site-packages/pandas-0.14.0-py2.7-cygwin-1.7.30-x86_64.egg/pandas/io/parsers.py", line 670, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/lib/python2.7/site-packages/pandas-0.14.0-py2.7-cygwin-1.7.30-x86_64.egg/pandas/io/parsers.py", line 1067, in __init__
    col_indices.append(self.names.index(u))
ValueError: 'Value' is not in list

采纳答案by EdChum

This appears to be a bug with the csv parser, firstly this works:

这似乎是 csv 解析器的一个错误,首先这是有效的:

df = pd.read_csv('MQM Q.csv')

also this works:

这也有效:

df = pd.read_csv('MQM Q.csv', usecols=['Value'])

but if I want Date-Timethen it fails with the same error message as yours.

但是如果我想要,Date-Time它会失败并显示与您相同的错误消息。

So I noticed it was utf-8 encoded and so I converted using notepad++ to ANSI and it worked, I then tried utf-8 without BOM and it also worked.

所以我注意到它是 utf-8 编码的,所以我使用 notepad++ 转换为 ANSI 并且它有效,然后我尝试了没有 BOM 的 utf-8 并且它也有效。

I then converted it to utf-8 (presumably there is now a BOM) and it failed with the same error as before, so I don't think you are imaging this now and this looks like a bug.

然后我将它转换为 utf-8(大概现在有一个 BOM)并且它失败了,并出现了与以前相同的错误,所以我认为您现在没有对此进行成像,这看起来像是一个错误。

I am using python 3.3, pandas 0.14 and numpy 1.8.1

我正在使用 python 3.3、pandas 0.14 和 numpy 1.8.1

To get around this do this:

要解决这个问题,请执行以下操作:

df = pd.read_csv('MQM Q.csv', usecols=[0,1], parse_dates=True, dayfirst=True, index_col=0)

This will set your index to the Date-Time column which will correctly convert to a datetimeindex.

这会将您的索引设置为日期时间列,该列将正确转换为日期时间索引。

In [40]:

df.index
Out[40]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-08-31 12:15:00, ..., 2013-11-28 10:45:00]
Length: 43577, Freq: None, Timezone: None

回答by Andy Hayden

Your code should read (no need from StringIO on the filename!):

你的代码应该是这样的(不需要从文件名上的 StringIO !):

import pandas as pd 
Q = pd.read_csv("/cygdrive/c/temp/MQM Q.csv"), header=0, usecols=["Date-Time", "Value"], parse_dates=True, dayfirst=True, index_col=0)

Otherwise/currently pandas is trying to read the string (of the path) in as a DataFrame:

否则/当前熊猫正在尝试将字符串(路径的)作为数据帧读取:

In [11]: pd.read_csv(StringIO("""/cygdrive/c/temp/MQM Q.csv"""))
Out[11]:
Empty DataFrame
Columns: [/cygdrive/c/temp/MQM Q.csv]
Index: []

which obviously isn't what you want (hence you see the Value is not a column exception).

这显然不是您想要的(因此您会看到 Value 不是列异常)。

回答by Hai Vu

The following works for me (I have the CSV file in the same directory as the script, but that should not matter). I am running the following script on my Mac, not Cygwin, but it should work the same way:

以下对我有用(我的 CSV 文件与脚本位于同一目录中,但这应该无关紧要)。我在我的 Mac 上运行以下脚本,而不是 Cygwin,但它应该以相同的方式工作:

import pandas as pd 
Q = pd.read_csv("MQM Q.csv",
        header=0,
        parse_dates=True, 
        dayfirst=True,
        index_col=0,
        usecols=["Date-Time", "Value"])
print Q

Discussion

讨论

  • StringIO will not work, unless you create a new StringIO object with the contents of the file, not the name of the file.
  • I don't have any problem with the "Date-Time" column. In fact, there is not error running the previous code at all.
  • StringIO 将不起作用,除非您使用文件的内容而不是文件的名称创建一个新的 StringIO 对象。
  • 我对“日期时间”列没有任何问题。事实上,运行之前的代码根本没有错误。