ipython pandas TypeError: read_csv() 得到了一个意外的关键字参数“delim-whitespace”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28101851/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:52:11  来源:igfitidea点击:

ipython pandas TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace''

pythonpython-2.7pandasipython

提问by importError

While trying the ipython.org notebook, "INTRODUCTION TO PYTHON FOR DATA MINING"

在尝试 ipython.org 笔记本时,“数据挖掘 Python 简介”

The following code:

以下代码:

data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original",
               delim_whitespace = True, header=None,
               names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
                        'model', 'origin', 'car_name'])

yields the following error:

产生以下错误:

 TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace'

Unfortunately the dataset file itself is not really csv, and I don't know why they used read_csv() to get its data.

不幸的是,数据集文件本身并不是真正的 csv,我不知道他们为什么使用 read_csv() 来获取其数据。

The data looks like this line:

数据看起来像这一行:

 14.0   8.   454.0      220.0      4354.       9.0   70.  1.    "chevrolet impala"

The environment is python/2.7 on Debian stable w/ ipython 0.13. After searching here, I realize it's mostly likely a version problem, as the argument 'delim-whitespace' maybe in a later version of the pandas library, than the one available to the APT package manager.

Debian 稳定版上的环境是 python/2.7,带有 ipython 0.13。在这里搜索后,我意识到这很可能是版本问题,因为参数“delim-whitespace”可能在 Pandas 库的更新版本中,而不是 APT 包管理器可用的版本。

I tried several workarounds, without success.

我尝试了几种解决方法,但都没有成功。

  • First, I tried to upgrade pandas, by building from latest source, but i found i would end up with a cascade of other builds of dependencies whose versions need upgrading and could end up breaking the environment. E.g., I had to install Cython, then it reported it was again a version too old on the APT package manager, so I would have to rebuild Cython, + other libs/modules and so on.

  • Then after looking at the API a bit, I tried using other arguments: using delimiter = ' ' in the call to read_csv() caused it to break up the strings inside quotes into several columns,

    ValueError: Expecting 9 columns, got 13 in row 0
    
  • I tried using the read_csv()argument quotechar='"', as documented in the API but again it was not recognized (unexpected keyword argument)

  • Finally I tried using a different way to load the file,

    data = DataFrame()
    
    data.from_csv(url)
    

    I got,

    Out[18]: 
    <class 'pandas.core.frame.DataFrame'>
    Index: 405 entries, 15.0   8.   350.0      165.0      3693.      11.5   70.  1."buick skylark 320" to 31.0   4.   119.0      82.00      2720.      19.4   82.  1.   "chevy s-10"
    Empty DataFrame
    
    In [19]: print(data.shape)
    (0, 9)
    
  • alternatively, w/ sep argument to from_csv(),

    In [20]: data.from_csv(url,sep=' ')
    

    yields the error,

    ValueError: Expecting 31 columns, got 35 in row 1
    In [21]: print(data.shape)
    (0, 9)
    
  • Also alternatively, with the same negative result:

    In [32]: data = DataFrame( columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration','model', 'origin', 'car_name'])
    
    In [33]: data.from_csv(url,sep=', \t')Out[33]: 
    <class 'pandas.core.frame.DataFrame'>
    Index: 405 entries, 15.0   8.   350.0      165.0      3693.      11.5   70.  1."buick skylark 320" to 31.0   4.   119.0      82.00      2720.      19.4   82.  1.   "chevy s-10"
    Empty DataFrame
    
    In [34]: data.head()
    Out[34]: 
    Empty DataFrame
    
  • I tried using ipython3 instead, but it cannot find/load matplotlib as there is not matplotlib for python3 for my system.

  • 首先,我尝试通过从最新源构建来升级 pandas,但我发现我最终会得到一连串其他版本的依赖项,其版本需要升级并最终可能破坏环境。例如,我必须安装 Cython,然后它又报告说它在 APT 包管理器上的版本太旧了,所以我必须重建 Cython、+ 其他库/模块等等。

  • 然后在稍微查看了 API 之后,我尝试使用其他参数:在对 read_csv() 的调用中使用 delimiter = ' ' 导致它将引号内的字符串分成几列,

    ValueError: Expecting 9 columns, got 13 in row 0
    
  • 我尝试使用 read_csv()参数 quotechar='"',如 API 中所述,但再次无法识别(意外的关键字参数)

  • 最后我尝试使用不同的方式加载文件,

    data = DataFrame()
    
    data.from_csv(url)
    

    我有,

    Out[18]: 
    <class 'pandas.core.frame.DataFrame'>
    Index: 405 entries, 15.0   8.   350.0      165.0      3693.      11.5   70.  1."buick skylark 320" to 31.0   4.   119.0      82.00      2720.      19.4   82.  1.   "chevy s-10"
    Empty DataFrame
    
    In [19]: print(data.shape)
    (0, 9)
    
  • 或者,使用 from_csv() 的 sep 参数,

    In [20]: data.from_csv(url,sep=' ')
    

    产生错误,

    ValueError: Expecting 31 columns, got 35 in row 1
    In [21]: print(data.shape)
    (0, 9)
    
  • 或者,具有相同的负面结果:

    In [32]: data = DataFrame( columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration','model', 'origin', 'car_name'])
    
    In [33]: data.from_csv(url,sep=', \t')Out[33]: 
    <class 'pandas.core.frame.DataFrame'>
    Index: 405 entries, 15.0   8.   350.0      165.0      3693.      11.5   70.  1."buick skylark 320" to 31.0   4.   119.0      82.00      2720.      19.4   82.  1.   "chevy s-10"
    Empty DataFrame
    
    In [34]: data.head()
    Out[34]: 
    Empty DataFrame
    
  • 我尝试使用 ipython3,但它无法找到/加载 matplotlib,因为我的系统没有用于 python3 的 matplotlib。

Any help with this problem would be greatly appreciated.

对这个问题的任何帮助将不胜感激。

回答by Steve Howard

Oddly, the delim_whitespace parameter appears in the Pandas documentationin the method summary but not the parameters list. Try replacing it with delimiter = r'\s+', which is equivalent to what I assume the authors meant.

奇怪的是, delim_whitespace 参数出现在Pandas 文档的方法摘要中,但没有出现在参数列表中。尝试将其替换为delimiter = r'\s+',这与我假设作者的意思相同。

CSV does refer to comma-separated values, but it's often used to refer to general delimited-text formats. TSV (tab-separated values) is another variant; in this case it's basically whitespace-separated values.

CSV 确实指的是逗号分隔的值,但它通常用于指代一般的分隔文本格式。TSV(制表符分隔值)是另一种变体;在这种情况下,它基本上是空格分隔的值。

回答by unutbu

Your code uses delim_whitespacebut the error message says delim-whitespace. The former exists, the latter does not.

您的代码使用delim_whitespace但错误消息显示delim-whitespace。前者存在,后者不存在。

If the data file contains

如果数据文件包含

 14.0   8.   454.0      220.0      4354.       9.0   70.  1.    "chevrolet impala"

and you define datawith

和定义data

data = pd.read_csv('data', delim_whitespace = True, header=None, names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model', 'origin', 'car_name'])

then the DataFrame does get parsed successfully:

然后 DataFrame 确实被成功解析:

   mpg  cylinders  displacement  horsepower  weight  acceleration  model  \
0   14          8           454         220    4354             9     70   

   origin          car_name  
0       1  chevrolet impala  

So you just have change the hyphen to an underscore.

因此,您只需将连字符更改为下划线即可。



Note that when you specify delim_whitespace=True, the pure Python parser is used. In this case I don't think that is necessary. Using delimiter=r'\s+'as Steve Howard suggests would probably perform better. (The source code says, "The C engine is faster while the python engine is currently more feature-complete", but I think the only featurethat the python engine has that the C engine does not is skipfooter.)

请注意,当您指定时delim_whitespace=True,将使用纯 Python 解析器。在这种情况下,我认为没有必要。delimiter=r'\s+'按照史蒂夫霍华德的建议使用可能会表现得更好。(源代码说,“C 引擎速度更快,而 Python 引擎目前功能更完整”,但我认为Python 引擎具有的唯一功能是 C 引擎没有skipfooter。)