Pandas 返回“传递的标头名称与 usecols 不匹配”错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31017823/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:31:18  来源:igfitidea点击:

Pandas returns "Passed header names mismatches usecols" error

pythonpandas

提问by Jason Sanchez

The following works as expected. There are 190 columns that are all read in perfectly.

以下按预期工作。有 190 列都可以完美阅读。

pd.read_csv("data.csv", 
             header=None,
             names=columns,
             # usecols=columns[:10], 
             nrows=10
             )

I have used the usecols argument before, so I am perplexed as to why this is no longer working for me. I would guess that simply slicing the first 10 column names would trivially work, but I continue to get the "Passed header names mismatches usecols" error.

我之前使用过 usecols 参数,所以我很困惑为什么这不再适合我。我猜想简单地切片前 10 个列名会很简单,但我继续收到“传递的标题名称不匹配 usecols”错误。

I am using pandas 0.16.2.

我正在使用Pandas 0.16.2。

pd.read_csv("data.csv", 
             header=None,
             names=columns,
             usecols=columns[:10], 
             nrows=10
             )

Traceback

追溯

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44> in <module>()
      3                     nrows=10,
      4                     header=None,
----> 5                     names=columns,
      6                     )

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    472                     skip_blank_lines=skip_blank_lines)
    473 
--> 474         return _read(filepath_or_buffer, kwds)
    475 
    476     parser_f.__name__ = name

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    248 
    249     # Create the parser.
--> 250     parser = TextFileReader(filepath_or_buffer, **kwds)
    251 
    252     if (nrows is not None) and (chunksize is not None):

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
    564             self.options['has_index_names'] = kwds['has_index_names']
    565 
--> 566         self._make_engine(self.engine)
    567 
    568     def _get_options_with_defaults(self, engine):

/.../m9tn/lib/python2.7/site-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
    703     def _make_engine(self, engine='c'):
    704         if engine == 'c':
--> 705             self._engine = CParserWrapper(self.f, **self.options)
    706         else:
    707             if engine == 'python':

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, src, **kwds)
   1070         kwds['allow_leading_cols'] = self.index_col is not False
   1071 
-> 1072         self._reader = _parser.TextReader(src, **kwds)
   1073 
   1074         # XXX

pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4732)()

pandas/parser.pyx in pandas.parser.TextReader._get_header (pandas/parser.c:7330)()

ValueError: Passed header names mismatches usecols

回答by Jason Sanchez

It turns out there were 191 columns in the dataset (not 190). Pandas automatically set my first column of data as the index. I don't quite know why it caused it to error out since all of the columns in usecols were in fact present in the parsed in dataset.

事实证明,数据集中有 191 列(不是 190)。Pandas 自动将我的第一列数据设置为索引。我不太清楚为什么它会导致它出错,因为 usecols 中的所有列实际上都存在于解析的数据集中。

So, the solution is to confirm that the number of columns in names exactly corresponds to the number of columns in your dataset.

因此,解决方案是确认名称中的列数与数据集中的列数完全对应。

Also, I found thisdiscussion on GitHub.

另外,我在 GitHub 上找到了这个讨论。

回答by Kirkman14

For anyone out there debugging this error, it can also be caused if you forget a trailing comma in your list of column names. e.g.:

对于调试此错误的任何人来说,如果您忘记了列名列表中的尾随逗号,也可能会导致此错误。例如:

    columns = [
        'industry',
        'amount'
        'date',
        ...
    ]

Pandas will concatenate amountand dateinto a single amountdate, and of course the number of column names will be one lower than you expect.

大Pandas将串连amountdate成一个单一的amountdate,当然列名的数量会比你期望一个较低的。