Python 将计算列附加到现有数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20737811/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:07:49  来源:igfitidea点击:

Attach a calculated column to an existing dataframe

pythonpandas

提问by

I am starting to learn Pandas, and I was following the question hereand could not get the solution proposed to work for me and I get an indexing error. This is what I have

我开始学习 Pandas,我正在关注这里的问题,但无法获得适合我的解决方案,并且出现索引错误。这就是我所拥有的

from pandas import *
import pandas as pd
d = {'L1' : Series(['X','X','Z','X','Z','Y','Z','Y','Y',]),
     'L2' : Series([1,2,1,3,2,1,3,2,3]),
     'L3' : Series([50,100,15,200,10,1,20,10,100])}
df = DataFrame(d)  
df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())

which outputs the following (I am using iPython)

输出以下内容(我正在使用 iPython)

L1   
X   3    0.571429
    1    0.857143
    0    1.000000
Y   8    0.900901
    7    0.990991
    5    1.000000
Z   6    0.444444
    2    0.777778
    4    1.000000
dtype: float64

Then, I try to append the cumulative number calculation under the label "new" as suggested in the post

然后,我尝试按照帖子中的建议在标签“新”下附加累积数计算

df["new"] = df.groupby("L1", as_index=False).apply(lambda x : pd.expanding_sum(x.sort("L3", ascending=False)["L3"])/x["L3"].sum())

I get this:

我明白了:

   2196                         value = value.reindex(self.index).values
   2197                     except:
-> 2198                         raise TypeError('incompatible index of inserted column '
   2199                                         'with frame index')
   2200 
TypeError: incompatible index of inserted column with frame index

Does anybody knows what the problem is? How can I reinsert the calculated value into the dataframe so it shows the values in order (descending by "new" for each label X, Y, Z.)

有谁知道问题是什么?如何将计算出的值重新插入数据框中,以便按顺序显示值(每个标签 X、Y、Z 以“新”降序)。

采纳答案by joris

The problem is, as the Error message says, that the index of the calculated column you want to insert is incompatible with the index of df.

问题是,正如错误消息所说,您要插入的计算列的索引与 的索引不兼容df

The index of dfis a simple index:

的索引df是一个简单的索引:

In [8]: df.index
Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')

while the index of the calculated column is a MultiIndex (as you also already can see in the output), supposing we call it new_column:

而计算列的索引是一个 MultiIndex(正如您在输出中已经看到的那样),假设我们称之为new_column

In [15]: new_column.index
Out[15]: 
MultiIndex
[(u'X', 3), (u'X', 1), (u'X', 0), (u'Y', 8), (u'Y', 7), (u'Y', 5), (u'Z', 6), (u'Z', 2), (u'Z', 4)]

For this reason, you cannot insert it into the frame. However, this is a bug in 0.12, as this does work in 0.13 (for which the answer in the linked question was tested) and the keyword as_index=Falseshould ensure the column L1is not added to the index.

因此,您无法将其插入框架中。但是,这是 0.12 中的错误,因为这在 0.13 中确实有效(已测试链接问题中的答案),并且关键字as_index=False应确保该列L1未添加到索引中。

SOLUTION for 0.12:
Remove the first level of the MultiIndex, so you get back the original index:

0.12 的解决方案
删除 MultiIndex 的第一级,以便您取回原始索引:

In [13]: new_column = df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
In [14]: df["new"] = new_column.reset_index(level=0, drop=True)


In pandas 0.13 (in development) this is fixed (https://github.com/pydata/pandas/pull/4670). It is for this reason the as_index=Falseis used in the groupby call, so the column L1(fow which you group) is not added to the index (creating a MultiIndex), so the original index is retained and the result can be appended to the original frame. But it seems the as_indexkeyword is ignored in 0.12 when using apply.

在 pandas 0.13(开发中)中,这是固定的(https://github.com/pydata/pandas/pull/4670)。正是由于这个原因as_index=False,在 groupby 调用中使用了该列L1(您分组的列)不会添加到索引中(创建 MultiIndex),因此保留原始索引并将结果附加到原始帧. 但似乎as_index在 0.12 中使用apply.