pandas 熊猫在多索引上应用函数

Question

提问by LostBoardOnTaurangaBeach

I would like to apply a function on a multiindex dataframe (basically groupby describe dataframe) without using for loop to traverse level 0 index.

我想在多索引数据帧（基本上是 groupby 描述数据帧）上应用一个函数，而不使用 for 循环来遍历 0 级索引。

Function I'd like to apply:

我想申请的功能：

def CI(x):
    import math
    sigma = x["std"]
    n = x["count"]
    return 1.96 * sigma / math.sqrt(n)

Sample of my dataframe:

我的数据框示例：

df = df.iloc[47:52, [3,4,-1]]

               a          b                    id
47          0.218182   0.000000  0d1974107c6731989c762e96def73568
48          0.000000   0.000000  0d1974107c6731989c762e96def73568
49          0.218182   0.130909  0d1974107c6731989c762e96def73568
50          0.000000   0.000000  0fd4f3b4adf43682f08e693a905b7432
51          0.000000   0.000000  0fd4f3b4adf43682f08e693a905b7432

And I replace zeros with nan:

我用nan替换零：

df = df.replace(float(0), np.nan)

Groupy on id and describe and I get multiindex:

Groupy 在 id 和 describe 上，我得到了多索引：

df_group = df.groupby("id").describe()

Current solution I don't like and think could be improved:

我不喜欢当前的解决方案，并认为可以改进：

l_df = []
for column in df_group.columns.levels[0]:
    df = pd.DataFrame({"CI" : df_group[column].apply(CI, axis = 1)})
    l_df.append(df)
CI = pd.concat(l_df, axis = 1)
CI.columns = df_group.columns.levels[0]

so I get something like:

所以我得到类似的信息：

                                    a       b
id
06f32e6e45da385834dac983256d59f3    nan     nan
0d1974107c6731989c762e96def73568    0.005   0.225
0fd4f3b4adf43682f08e693a905b7432    0.008   nan
11e0057cdc8b8e1b1cdabfa8a092ea5f    0.018   0.582
120549af6977623bd01d77135a91a523    0.008   0.204

So again, if I have top level columns from a to z, and each contains std and count column, how can I apply my function to each of these columns at the same time?

再说一次，如果我有从 a 到 z 的顶级列，并且每个列都包含 std 和 count 列，那么我如何同时将我的函数应用于这些列中的每一个？

Answer 1

回答by Zero

Using groupbyon levelwith axis=1, let's you iterate and apply over the first level columns.

使用groupbyon levelwith axis=1，让您迭代并应用第一级列。

In [104]: (df.groupby("id").describe()
             .groupby(level=0, axis=1)
             .apply(lambda x: x[x.name].apply(CI, axis=1)))
Out[104]:
                                    a   b
id
0d1974107c6731989c762e96def73568  0.0 NaN
0fd4f3b4adf43682f08e693a905b7432  NaN NaN

Infact, you don't need CI, if you were to

事实上，你不需要CI，如果你要

In [105]: (df.groupby("id").describe()
             .groupby(level=0, axis=1).apply(lambda x: x[x.name]
             .apply(lambda x: 1.96*x['std']/np.sqrt(x['count']), axis=1)))
Out[105]:
                                    a   b
id
0d1974107c6731989c762e96def73568  0.0 NaN
0fd4f3b4adf43682f08e693a905b7432  NaN NaN

Sample df

样本 df

In [106]: df
Out[106]:
           a         b                                id
47  0.218182       NaN  0d1974107c6731989c762e96def73568
48       NaN       NaN  0d1974107c6731989c762e96def73568
49  0.218182  0.130909  0d1974107c6731989c762e96def73568
50       NaN       NaN  0fd4f3b4adf43682f08e693a905b7432
51       NaN       NaN  0fd4f3b4adf43682f08e693a905b7432

pandas 熊猫在多索引上应用函数

提问by LostBoardOnTaurangaBeach

回答by Zero

相关推荐

最近更新

标签

pandas 熊猫在多索引上应用函数

提问by LostBoardOnTaurangaBeach

回答by Zero

相关推荐

总结 Pandas DataFrames 的列表

如何使用 Pandas 为整列换行文本？

pandas 使用 iloc 从数据框中切片多个列范围

pandas 将小数格式化为列中的百分比

相关推荐

最近更新

标签