Pandas 返回“传递的标头名称与 usecols 不匹配”错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31017823/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas returns "Passed header names mismatches usecols" error
提问by Jason Sanchez
The following works as expected. There are 190 columns that are all read in perfectly.
以下按预期工作。有 190 列都可以完美阅读。
pd.read_csv("data.csv",
header=None,
names=columns,
# usecols=columns[:10],
nrows=10
)
I have used the usecols argument before, so I am perplexed as to why this is no longer working for me. I would guess that simply slicing the first 10 column names would trivially work, but I continue to get the "Passed header names mismatches usecols" error.
我之前使用过 usecols 参数,所以我很困惑为什么这不再适合我。我猜想简单地切片前 10 个列名会很简单,但我继续收到“传递的标题名称不匹配 usecols”错误。
I am using pandas 0.16.2.
我正在使用Pandas 0.16.2。
pd.read_csv("data.csv",
header=None,
names=columns,
usecols=columns[:10],
nrows=10
)
Traceback
追溯
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-44> in <module>()
3 nrows=10,
4 header=None,
----> 5 names=columns,
6 )
/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
472 skip_blank_lines=skip_blank_lines)
473
--> 474 return _read(filepath_or_buffer, kwds)
475
476 parser_f.__name__ = name
/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
248
249 # Create the parser.
--> 250 parser = TextFileReader(filepath_or_buffer, **kwds)
251
252 if (nrows is not None) and (chunksize is not None):
/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
564 self.options['has_index_names'] = kwds['has_index_names']
565
--> 566 self._make_engine(self.engine)
567
568 def _get_options_with_defaults(self, engine):
/.../m9tn/lib/python2.7/site-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
703 def _make_engine(self, engine='c'):
704 if engine == 'c':
--> 705 self._engine = CParserWrapper(self.f, **self.options)
706 else:
707 if engine == 'python':
/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, src, **kwds)
1070 kwds['allow_leading_cols'] = self.index_col is not False
1071
-> 1072 self._reader = _parser.TextReader(src, **kwds)
1073
1074 # XXX
pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4732)()
pandas/parser.pyx in pandas.parser.TextReader._get_header (pandas/parser.c:7330)()
ValueError: Passed header names mismatches usecols
回答by Jason Sanchez
It turns out there were 191 columns in the dataset (not 190). Pandas automatically set my first column of data as the index. I don't quite know why it caused it to error out since all of the columns in usecols were in fact present in the parsed in dataset.
事实证明,数据集中有 191 列(不是 190)。Pandas 自动将我的第一列数据设置为索引。我不太清楚为什么它会导致它出错,因为 usecols 中的所有列实际上都存在于解析的数据集中。
So, the solution is to confirm that the number of columns in names exactly corresponds to the number of columns in your dataset.
因此,解决方案是确认名称中的列数与数据集中的列数完全对应。
Also, I found thisdiscussion on GitHub.
另外,我在 GitHub 上找到了这个讨论。
回答by Kirkman14
For anyone out there debugging this error, it can also be caused if you forget a trailing comma in your list of column names. e.g.:
对于调试此错误的任何人来说,如果您忘记了列名列表中的尾随逗号,也可能会导致此错误。例如:
columns = [
'industry',
'amount'
'date',
...
]
Pandas will concatenate amountand dateinto a single amountdate, and of course the number of column names will be one lower than you expect.
大Pandas将串连amount并date成一个单一的amountdate,当然列名的数量会比你期望一个较低的。

