pandas 使用usecols时pandas.read_excel错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/55239010/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas.read_excel error when using usecols
提问by Giacomo Sachs
I am having some problem in reading data from an Excel file. The Excel file contains column names with unicode characters.
我在从 Excel 文件读取数据时遇到一些问题。Excel 文件包含带有 unicode 字符的列名称。
I need, because of some automation reasons, to pass the usecolsargument to the pandas.read_excel function.
由于某些自动化原因,我需要将usecols参数传递给 pandas.read_excel 函数。
The thing is that when I don't use the usecolsargument the data is loaded with no errors.
问题是,当我不使用usecols参数时,数据加载时没有错误。
Here's the code:
这是代码:
import pandas as pd
df = pd.read_excel(file)
df.colums
Index([u'col1', u'col2', u'col3', u'col with unicode à', u'col4'], dtype='object')
If I use usecols:
如果我使用 usecols:
COLUMNS = ['col1', 'col2', 'col with unicode à']
df = pd.read_excel(file, usecols = COLUMNS)
I receive the following error:
我收到以下错误:
ValueError: Usecols do not match columns, columns expected but not found: ['col with unicode \xc3\xa0']
Using encoding = 'utf-8'
as argument of read_excel does not solve the problem, and also encoding the COLUMNS elements.
使用encoding = 'utf-8'
作为 read_excel 的参数并不能解决问题,也不能对 COLUMNS 元素进行编码。
EDIT: Here the complete error window.
编辑:这里是完整的错误窗口。
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-22-541ccb88da6a> in <module>()
2 df = pd.read_excel(file)
3 cols = df.columns
----> 4 df = pd.read_excel(file, usecols = ['col1', 'col2', 'col with unicode à'])
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\util\_decorators.pyc in wrapper(*args, **kwargs)
186 else:
187 kwargs[new_arg_name] = new_arg_value
--> 188 return func(*args, **kwargs)
189 return wrapper
190 return _deprecate_kwarg
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\util\_decorators.pyc in wrapper(*args, **kwargs)
186 else:
187 kwargs[new_arg_name] = new_arg_value
--> 188 return func(*args, **kwargs)
189 return wrapper
190 return _deprecate_kwarg
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\excel.pyc in read_excel(io, sheet_name, header, names, index_col, parse_cols, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skip_footer, skipfooter, convert_float, mangle_dupe_cols, **kwds)
373 convert_float=convert_float,
374 mangle_dupe_cols=mangle_dupe_cols,
--> 375 **kwds)
376
377
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
716 convert_float=convert_float,
717 mangle_dupe_cols=mangle_dupe_cols,
--> 718 **kwds)
719
720 @property
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
599 usecols=usecols,
600 mangle_dupe_cols=mangle_dupe_cols,
--> 601 **kwds)
602
603 output[asheetname] = parser.read(nrows=nrows)
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in TextParser(*args, **kwds)
2154 """
2155 kwds['engine'] = 'python'
-> 2156 return TextFileReader(*args, **kwds)
2157
2158
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in __init__(self, f, engine, **kwds)
893 self.options['has_index_names'] = kwds['has_index_names']
894
--> 895 self._make_engine(self.engine)
896
897 def close(self):
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _make_engine(self, engine)
1130 ' "c", "python", or' ' "python-fwf")'.format(
1131 engine=engine))
-> 1132 self._engine = klass(self.f, **self.options)
1133
1134 def _failover_to_python(self):
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in __init__(self, f, **kwds)
2236 self._col_indices = None
2237 (self.columns, self.num_original_columns,
-> 2238 self.unnamed_cols) = self._infer_columns()
2239
2240 # Now self.columns has the set of columns that we will process.
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _infer_columns(self)
2609 columns = [names]
2610 else:
-> 2611 columns = self._handle_usecols(columns, columns[0])
2612 else:
2613 try:
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _handle_usecols(self, columns, usecols_key)
2669 col_indices.append(usecols_key.index(col))
2670 except ValueError:
-> 2671 _validate_usecols_names(self.usecols, usecols_key)
2672 else:
2673 col_indices.append(col)
C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _validate_usecols_names(usecols, names)
1235 raise ValueError(
1236 "Usecols do not match columns, "
-> 1237 "columns expected but not found: {missing}".format(missing=missing)
1238 )
1239
ValueError: Usecols do not match columns, columns expected but not found: ['col with unicode \xc3\xa0']
回答by user69659
first read the columns like
首先阅读像
df = pd.read_excel(file, usecols="A:D")
where A:D is range of columns in excel you want to read then rename your columns like this
其中 A:D 是您要阅读的 excel 列范围,然后像这样重命名您的列
df.columns = ['col1', 'col2', 'col3', 'col4']
then access column accordingly
然后相应地访问列
回答by Pablo Vilas
This methods are really efficient to select excel columns:
这种方法对于选择excel列非常有效:
First case using numbers, column "A" = 0, columns "B" = 1 etc.
第一种情况使用数字,列“A”= 0,列“B”= 1 等。
df = pd.read_excel("filename.xlsx",usecols= range(0,5))
df = pd.read_excel("filename.xlsx",usecols= range(0,5))
Second case using letters:
使用字母的第二种情况:
df = pd.read_excel("filename.xlsx",usecols= "A, C, E:J")
df = pd.read_excel("filename.xlsx",usecols= "A, C, E:J")
回答by ashok
In case you want to read your excel file by specific column names, follow the following sample code using "usecol":
如果您想按特定列名读取 excel 文件,请使用“usecol”按照以下示例代码进行操作:
> df = pd.read_excel("filename.xlsx",usecols=["col_name1", "col_name2", "col_name3"])
> print(df)