pandas 重命名列后得到keyerror

Question

提问by jezrael

I have df:

我有df：

df = pd.DataFrame({'a':[7,8,9],
                   'b':[1,3,5],
                   'c':[5,3,6]})

print (df)
   a  b  c
0  7  1  5
1  8  3  3
2  9  5  6

Then rename first value by this:

然后通过这个重命名第一个值：

df.columns.values[0] = 'f'

All seems very nice:

一切看起来都很好：

print (df)
   f  b  c
0  7  1  5
1  8  3  3
2  9  5  6

print (df.columns)
Index(['f', 'b', 'c'], dtype='object')

print (df.columns.values)
['f' 'b' 'c']

If select bit works nice:

如果选择b它效果很好：

print (df['b'])
0    1
1    3
2    5
Name: b, dtype: int64

But if select ait return column f:

但是如果选择a它返回列f：

print (df['a'])
0    7
1    8
2    9
Name: f, dtype: int64

And if select fget keyerror.

如果选择f获取keyerror。

print (df['f'])
#KeyError: 'f'

print (df.info())
#KeyError: 'f'

What is problem? Can somebody explain it? Or bug?

什么是问题？有人可以解释一下吗？还是bug？

Answer 1

回答by piRSquared

You aren't expected to alter the valuesattribute.

您不应更改该values属性。

Try df.columns.values = ['a', 'b', 'c']and you get:

尝试df.columns.values = ['a', 'b', 'c']，你会得到：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-61-e7e440adc404> in <module>()
----> 1 df.columns.values = ['a', 'b', 'c']

AttributeError: can't set attribute

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-61-e7e440adc404> in <module>()
----> 1 df.columns.values = ['a', 'b', 'c']

AttributeError: can't set attribute

That's because pandasdetects that you are trying to set the attribute and stops you.

那是因为pandas检测到您正在尝试设置属性并阻止您。

However, it can't stop you from changing the underlying valuesobject itself.

但是，它不能阻止您更改底层values对象本身。

When you use rename, pandasfollows up with a bunch of clean up stuff. I've pasted the source below.

当你使用rename，pandas跟进一堆清理的东西。我已经粘贴了下面的来源。

Ultimately what you've done is altered the values without initiating the clean up. You can initiate it yourself with a followup call to _data.rename_axis(example can be seen in source below). This will force the clean up to be run and then you can access ['f']

最终，您所做的是在不启动清理的情况下更改了值。您可以通过后续调用_data.rename_axis自行启动它（示例可以在下面的源代码中看到）。这将强制运行清理，然后您可以访问['f']

df._data = df._data.rename_axis(lambda x: x, 0, True)
df['f']

0    7
1    8
2    9
Name: f, dtype: int64

Moral of the story: probably not a great idea to rename a column this way.

这个故事的寓意：以这种方式重命名列可能不是一个好主意。

but this story gets weirder

但这个故事变得更奇怪了

This is fine

这可以

df = pd.DataFrame({'a':[7,8,9],
                   'b':[1,3,5],
                   'c':[5,3,6]})

df.columns.values[0] = 'f'

df['f']

0    7
1    8
2    9
Name: f, dtype: int64

This is notfine

这是不罚款

df = pd.DataFrame({'a':[7,8,9],
                   'b':[1,3,5],
                   'c':[5,3,6]})

print(df)

df.columns.values[0] = 'f'

df['f']

KeyError:

KeyError:

Turns out, we can modify the valuesattribute prior to displaying dfand it will apparently run all the initialization upon the first display. If you display it prior to changing the valuesattribute, it will error out.

事实证明，我们可以values在显示之前修改属性df，它显然会在第一个display. 如果在更改values属性之前显示它，则会出错。

weirder still

更奇怪的是

df = pd.DataFrame({'a':[7,8,9],
                   'b':[1,3,5],
                   'c':[5,3,6]})

print(df)

df.columns.values[0] = 'f'

df['f'] = 1

df['f']

   f  f
0  7  1
1  8  1
2  9  1

As if we didn't already know that this was a bad idea...

好像我们还不知道这是一个坏主意……

source for rename

来源 rename

def rename(self, *args, **kwargs):

    axes, kwargs = self._construct_axes_from_arguments(args, kwargs)
    copy = kwargs.pop('copy', True)
    inplace = kwargs.pop('inplace', False)

    if kwargs:
        raise TypeError('rename() got an unexpected keyword '
                        'argument "{0}"'.format(list(kwargs.keys())[0]))

    if com._count_not_none(*axes.values()) == 0:
        raise TypeError('must pass an index to rename')

    # renamer function if passed a dict
    def _get_rename_function(mapper):
        if isinstance(mapper, (dict, ABCSeries)):

            def f(x):
                if x in mapper:
                    return mapper[x]
                else:
                    return x
        else:
            f = mapper

        return f

    self._consolidate_inplace()
    result = self if inplace else self.copy(deep=copy)

    # start in the axis order to eliminate too many copies
    for axis in lrange(self._AXIS_LEN):
        v = axes.get(self._AXIS_NAMES[axis])
        if v is None:
            continue
        f = _get_rename_function(v)

        baxis = self._get_block_manager_axis(axis)
        result._data = result._data.rename_axis(f, axis=baxis, copy=copy)
        result._clear_item_cache()

    if inplace:
        self._update_inplace(result._data)
    else:
        return result.__finalize__(self)

pandas 重命名列后得到keyerror

提问by jezrael

回答by piRSquared

相关推荐

最近更新

标签

pandas 重命名列后得到keyerror

提问by jezrael

回答by piRSquared

相关推荐

pandas matplotlib 散点图 x 轴标签

pandas 类型错误：“不支持 - 的操作数类型：'时间戳'和'str'”熊猫

Pandas：使用包含在索引中的列名时出现 KeyError

pandas 如何在python中使用panda在现有excel表中附加列

相关推荐

最近更新

标签