pandas 熊猫在多索引上应用函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46097992/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas apply function on multiindex
提问by LostBoardOnTaurangaBeach
I would like to apply a function on a multiindex dataframe (basically groupby describe dataframe) without using for loop to traverse level 0 index.
我想在多索引数据帧(基本上是 groupby 描述数据帧)上应用一个函数,而不使用 for 循环来遍历 0 级索引。
Function I'd like to apply:
我想申请的功能:
def CI(x):
import math
sigma = x["std"]
n = x["count"]
return 1.96 * sigma / math.sqrt(n)
Sample of my dataframe:
我的数据框示例:
df = df.iloc[47:52, [3,4,-1]]
a b id
47 0.218182 0.000000 0d1974107c6731989c762e96def73568
48 0.000000 0.000000 0d1974107c6731989c762e96def73568
49 0.218182 0.130909 0d1974107c6731989c762e96def73568
50 0.000000 0.000000 0fd4f3b4adf43682f08e693a905b7432
51 0.000000 0.000000 0fd4f3b4adf43682f08e693a905b7432
And I replace zeros with nan:
我用nan替换零:
df = df.replace(float(0), np.nan)
Groupy on id and describe and I get multiindex:
Groupy 在 id 和 describe 上,我得到了多索引:
df_group = df.groupby("id").describe()
Current solution I don't like and think could be improved:
我不喜欢当前的解决方案,并认为可以改进:
l_df = []
for column in df_group.columns.levels[0]:
df = pd.DataFrame({"CI" : df_group[column].apply(CI, axis = 1)})
l_df.append(df)
CI = pd.concat(l_df, axis = 1)
CI.columns = df_group.columns.levels[0]
so I get something like:
所以我得到类似的信息:
a b
id
06f32e6e45da385834dac983256d59f3 nan nan
0d1974107c6731989c762e96def73568 0.005 0.225
0fd4f3b4adf43682f08e693a905b7432 0.008 nan
11e0057cdc8b8e1b1cdabfa8a092ea5f 0.018 0.582
120549af6977623bd01d77135a91a523 0.008 0.204
So again, if I have top level columns from a to z, and each contains std and count column, how can I apply my function to each of these columns at the same time?
再说一次,如果我有从 a 到 z 的顶级列,并且每个列都包含 std 和 count 列,那么我如何同时将我的函数应用于这些列中的每一个?
回答by Zero
Using groupby
on level
with axis=1
, let's you iterate and apply over the first level columns.
使用groupby
on level
with axis=1
,让您迭代并应用第一级列。
In [104]: (df.groupby("id").describe()
.groupby(level=0, axis=1)
.apply(lambda x: x[x.name].apply(CI, axis=1)))
Out[104]:
a b
id
0d1974107c6731989c762e96def73568 0.0 NaN
0fd4f3b4adf43682f08e693a905b7432 NaN NaN
Infact, you don't need CI
, if you were to
事实上,你不需要CI
,如果你要
In [105]: (df.groupby("id").describe()
.groupby(level=0, axis=1).apply(lambda x: x[x.name]
.apply(lambda x: 1.96*x['std']/np.sqrt(x['count']), axis=1)))
Out[105]:
a b
id
0d1974107c6731989c762e96def73568 0.0 NaN
0fd4f3b4adf43682f08e693a905b7432 NaN NaN
Sample df
样本 df
In [106]: df
Out[106]:
a b id
47 0.218182 NaN 0d1974107c6731989c762e96def73568
48 NaN NaN 0d1974107c6731989c762e96def73568
49 0.218182 0.130909 0d1974107c6731989c762e96def73568
50 NaN NaN 0fd4f3b4adf43682f08e693a905b7432
51 NaN NaN 0fd4f3b4adf43682f08e693a905b7432