pandas 重命名列后得到keyerror
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43291781/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
After rename column get keyerror
提问by jezrael
I have df
:
我有df
:
df = pd.DataFrame({'a':[7,8,9],
'b':[1,3,5],
'c':[5,3,6]})
print (df)
a b c
0 7 1 5
1 8 3 3
2 9 5 6
Then rename first value by this:
然后通过这个重命名第一个值:
df.columns.values[0] = 'f'
All seems very nice:
一切看起来都很好:
print (df)
f b c
0 7 1 5
1 8 3 3
2 9 5 6
print (df.columns)
Index(['f', 'b', 'c'], dtype='object')
print (df.columns.values)
['f' 'b' 'c']
If select b
it works nice:
如果选择b
它效果很好:
print (df['b'])
0 1
1 3
2 5
Name: b, dtype: int64
But if select a
it return column f
:
但是如果选择a
它返回列f
:
print (df['a'])
0 7
1 8
2 9
Name: f, dtype: int64
And if select f
get keyerror.
如果选择f
获取keyerror。
print (df['f'])
#KeyError: 'f'
print (df.info())
#KeyError: 'f'
What is problem? Can somebody explain it? Or bug?
什么是问题?有人可以解释一下吗?还是bug?
回答by piRSquared
You aren't expected to alter the values
attribute.
您不应更改该values
属性。
Try df.columns.values = ['a', 'b', 'c']
and you get:
尝试df.columns.values = ['a', 'b', 'c']
,你会得到:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-61-e7e440adc404> in <module>() ----> 1 df.columns.values = ['a', 'b', 'c'] AttributeError: can't set attribute
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-61-e7e440adc404> in <module>() ----> 1 df.columns.values = ['a', 'b', 'c'] AttributeError: can't set attribute
That's because pandas
detects that you are trying to set the attribute and stops you.
那是因为pandas
检测到您正在尝试设置属性并阻止您。
However, it can't stop you from changing the underlying values
object itself.
但是,它不能阻止您更改底层values
对象本身。
When you use rename
, pandas
follows up with a bunch of clean up stuff. I've pasted the source below.
当你使用rename
,pandas
跟进一堆清理的东西。我已经粘贴了下面的来源。
Ultimately what you've done is altered the values without initiating the clean up. You can initiate it yourself with a followup call to _data.rename_axis
(example can be seen in source below). This will force the clean up to be run and then you can access ['f']
最终,您所做的是在不启动清理的情况下更改了值。您可以通过后续调用_data.rename_axis
自行启动它(示例可以在下面的源代码中看到)。这将强制运行清理,然后您可以访问['f']
df._data = df._data.rename_axis(lambda x: x, 0, True)
df['f']
0 7
1 8
2 9
Name: f, dtype: int64
Moral of the story: probably not a great idea to rename a column this way.
这个故事的寓意:以这种方式重命名列可能不是一个好主意。
but this story gets weirder
但这个故事变得更奇怪了
This is fine
这可以
df = pd.DataFrame({'a':[7,8,9],
'b':[1,3,5],
'c':[5,3,6]})
df.columns.values[0] = 'f'
df['f']
0 7
1 8
2 9
Name: f, dtype: int64
This is notfine
这是不罚款
df = pd.DataFrame({'a':[7,8,9],
'b':[1,3,5],
'c':[5,3,6]})
print(df)
df.columns.values[0] = 'f'
df['f']
KeyError:
KeyError:
Turns out, we can modify the values
attribute prior to displaying df
and it will apparently run all the initialization upon the first display
. If you display it prior to changing the values
attribute, it will error out.
事实证明,我们可以values
在显示之前修改属性df
,它显然会在第一个display
. 如果在更改values
属性之前显示它,则会出错。
weirder still
更奇怪的是
df = pd.DataFrame({'a':[7,8,9],
'b':[1,3,5],
'c':[5,3,6]})
print(df)
df.columns.values[0] = 'f'
df['f'] = 1
df['f']
f f
0 7 1
1 8 1
2 9 1
As if we didn't already know that this was a bad idea...
好像我们还不知道这是一个坏主意……
source for rename
来源 rename
def rename(self, *args, **kwargs):
axes, kwargs = self._construct_axes_from_arguments(args, kwargs)
copy = kwargs.pop('copy', True)
inplace = kwargs.pop('inplace', False)
if kwargs:
raise TypeError('rename() got an unexpected keyword '
'argument "{0}"'.format(list(kwargs.keys())[0]))
if com._count_not_none(*axes.values()) == 0:
raise TypeError('must pass an index to rename')
# renamer function if passed a dict
def _get_rename_function(mapper):
if isinstance(mapper, (dict, ABCSeries)):
def f(x):
if x in mapper:
return mapper[x]
else:
return x
else:
f = mapper
return f
self._consolidate_inplace()
result = self if inplace else self.copy(deep=copy)
# start in the axis order to eliminate too many copies
for axis in lrange(self._AXIS_LEN):
v = axes.get(self._AXIS_NAMES[axis])
if v is None:
continue
f = _get_rename_function(v)
baxis = self._get_block_manager_axis(axis)
result._data = result._data.rename_axis(f, axis=baxis, copy=copy)
result._clear_item_cache()
if inplace:
self._update_inplace(result._data)
else:
return result.__finalize__(self)