Python 将计算列附加到现有数据框

Question

提问by

I am starting to learn Pandas, and I was following the question hereand could not get the solution proposed to work for me and I get an indexing error. This is what I have

我开始学习 Pandas，我正在关注这里的问题，但无法获得适合我的解决方案，并且出现索引错误。这就是我所拥有的

from pandas import *
import pandas as pd
d = {'L1' : Series(['X','X','Z','X','Z','Y','Z','Y','Y',]),
     'L2' : Series([1,2,1,3,2,1,3,2,3]),
     'L3' : Series([50,100,15,200,10,1,20,10,100])}
df = DataFrame(d)  
df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())

which outputs the following (I am using iPython)

输出以下内容（我正在使用 iPython）

L1   
X   3    0.571429
    1    0.857143
    0    1.000000
Y   8    0.900901
    7    0.990991
    5    1.000000
Z   6    0.444444
    2    0.777778
    4    1.000000
dtype: float64

Then, I try to append the cumulative number calculation under the label "new" as suggested in the post

然后，我尝试按照帖子中的建议在标签“新”下附加累积数计算

df["new"] = df.groupby("L1", as_index=False).apply(lambda x : pd.expanding_sum(x.sort("L3", ascending=False)["L3"])/x["L3"].sum())

I get this:

我明白了：

   2196                         value = value.reindex(self.index).values
   2197                     except:
-> 2198                         raise TypeError('incompatible index of inserted column '
   2199                                         'with frame index')
   2200 
TypeError: incompatible index of inserted column with frame index

Does anybody knows what the problem is? How can I reinsert the calculated value into the dataframe so it shows the values in order (descending by "new" for each label X, Y, Z.)

有谁知道问题是什么？如何将计算出的值重新插入数据框中，以便按顺序显示值（每个标签 X、Y、Z 以“新”降序）。

Answer 1

采纳答案by joris

The problem is, as the Error message says, that the index of the calculated column you want to insert is incompatible with the index of df.

问题是，正如错误消息所说，您要插入的计算列的索引与的索引不兼容df。

The index of dfis a simple index:

的索引df是一个简单的索引：

In [8]: df.index
Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')

while the index of the calculated column is a MultiIndex (as you also already can see in the output), supposing we call it new_column:

而计算列的索引是一个 MultiIndex（正如您在输出中已经看到的那样），假设我们称之为new_column：

In [15]: new_column.index
Out[15]: 
MultiIndex
[(u'X', 3), (u'X', 1), (u'X', 0), (u'Y', 8), (u'Y', 7), (u'Y', 5), (u'Z', 6), (u'Z', 2), (u'Z', 4)]

For this reason, you cannot insert it into the frame. However, this is a bug in 0.12, as this does work in 0.13 (for which the answer in the linked question was tested) and the keyword as_index=Falseshould ensure the column L1is not added to the index.

因此，您无法将其插入框架中。但是，这是 0.12 中的错误，因为这在 0.13 中确实有效（已测试链接问题中的答案），并且关键字as_index=False应确保该列L1未添加到索引中。

SOLUTION for 0.12:
Remove the first level of the MultiIndex, so you get back the original index:

0.12 的解决方案：
删除 MultiIndex 的第一级，以便您取回原始索引：

In [13]: new_column = df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
In [14]: df["new"] = new_column.reset_index(level=0, drop=True)

In pandas 0.13 (in development) this is fixed (https://github.com/pydata/pandas/pull/4670). It is for this reason the as_index=Falseis used in the groupby call, so the column L1(fow which you group) is not added to the index (creating a MultiIndex), so the original index is retained and the result can be appended to the original frame. But it seems the as_indexkeyword is ignored in 0.12 when using apply.

在 pandas 0.13（开发中）中，这是固定的（https://github.com/pydata/pandas/pull/4670）。正是由于这个原因as_index=False，在 groupby 调用中使用了该列L1（您分组的列）不会添加到索引中（创建 MultiIndex），因此保留原始索引并将结果附加到原始帧. 但似乎as_index在 0.12 中使用apply.

Python 将计算列附加到现有数据框

提问by

采纳答案by joris

相关推荐

最近更新

标签

Python 将计算列附加到现有数据框

提问by

采纳答案by joris

相关推荐

Python SyntaxError: ("'return' with argument inside generator",)

Python - 从谷歌图片搜索下载图片？

Python matplotlib 中更漂亮的默认绘图颜色

Python 如何从列表中找到缺失的数字？

相关推荐

最近更新

标签