Python Pandas,从 .groupby().apply() 中的组切片行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36070288/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas, slice rows from group in .groupby().apply()
提问by jfive
I have the following code setup that calls and groupBy and apply on a Python Pandas DataFrame.
我有以下代码设置调用和 groupBy 并应用于 Python Pandas DataFrame。
The bizarre thing is I am unable to slice the grouped data by row (like df.loc[2:5]
) without it completely screwing the output (as shown in the debug), how can you drop rows and get this to give the desired output?
奇怪的是,我无法按行(如df.loc[2:5]
)对分组数据进行切片而不完全破坏输出(如调试中所示),如何删除行并获得所需的输出?
Any help would be massively appreciated, I'm running this on a bigger example with more complicated functions, but have pinpointed the issues to the row slicing!
任何帮助将不胜感激,我正在一个更大的例子上运行它,它具有更复杂的功能,但已将问题定位到行切片!
Code:
代码:
import pandas as pd
df = pd.DataFrame({'one' : ['AAL', 'AAL', 'AAPL', 'AAPL'], 'two' : [1, 2, 3, 4]})
def net_func(df):
df_res = daily_func(df, True)
df_res_valid = daily_func(df, False)
df_merge = pd.merge(df_res, df_res_valid)
return df_merge
def daily_func(df, bool_param):
# df.drop(df.head(1).index, inplace=True)
# df = df[1:1]
# df.iloc[1:1,:]
# df.loc[1:1,:]
if bool_param:
df['daily'+str(bool_param)] = 1
else:
df['daily'+str(bool_param)] = 0
return df
print df.groupby('one').apply(net_func)
Current output:
电流输出:
one two dailyTrue dailyFalse
one
AAL 0 AAL 1 1 0
1 AAL 2 1 0
AAPL 0 AAPL 1 1 0
1 AAPL 2 1 0
Desired output:
期望的输出:
one two dailyTrue dailyFalse
one
AAL 1 AAL 2 1 0
AAPL 1 AAPL 2 1 0
Ideally, I would like to be able to slice by row for each group for example df.loc[3:5]
- This would be perfect!
理想情况下,我希望能够为每个组逐行切片df.loc[3:5]
- 这将是完美的!
I've tried the commented as follows:
我试过评论如下:
output with df.drop(df.head(1).index, inplace=True)
:
输出df.drop(df.head(1).index, inplace=True)
:
Empty DataFrame
Columns: [one, two, dailyTrue, dailyFalse]
Index: []
Update: also tried output with df = df[1:1]
:
更新:还尝试输出df = df[1:1]
:
Empty DataFrame
Columns: [one, two, dailyTrue, dailyFalse]
Index: []
Update have also tried df.iloc[1:1,:]
:
更新也尝试过df.iloc[1:1,:]
:
one two dailyTrue dailyFalse
one
AAL 0 AAL 1 1 0
1 AAL 2 1 0
AAPL 0 AAPL 1 1 0
1 AAPL 2 1 0
and df.loc[1:1,:]
:
和df.loc[1:1,:]
:
one two dailyTrue dailyFalse
one
AAL 0 AAL 1 1 0
1 AAL 2 1 0
AAPL 0 AAPL 1 1 0
1 AAPL 2 1 0
采纳答案by Parfait
Consider using the cross section slice, xs
after the groupby().apply()
, specifying each key accordingly:
考虑在xs
之后使用横截面切片,相应地groupby().apply()
指定每个键:
print df.groupby('one').apply(net_func).xs(0, level=1)
# one two dailyTrue dailyFalse
#one
#AAL AAL 1 1 0
#AAPL AAPL 1 1 0
print df.groupby('one').apply(net_func).xs(1, level=1)
# one two dailyTrue dailyFalse
#one
#AAL AAL 2 1 0
#AAPL AAPL 2 1 0
Alternatively, use multiple indexingwith list of tuples:
或者,对元组列表使用多重索引:
print df.groupby('one').apply(net_func).ix[[('AAL', 1), ('AAPL', 1)]]
# one two dailyTrue dailyFalse
#one
#AAL 1 AAL 2 1 0
#AAPL 1 AAPL 2 1 0
Still more with slice (introduced in pandas 0.14):
还有更多切片(在Pandas 0.14 中引入):
print df.groupby('one').apply(net_func).loc[(slice('AAL','AAPL'),slice(1,1)),:]
# one two dailyTrue dailyFalse
#one
#AAL 1 AAL 2 1 0
#AAPL 1 AAPL 2 1 0
回答by Learning is a mess
I felt the need for slicing inside GroupBy
object and I have been doing so by applying this monkey patch:
我觉得需要在GroupBy
对象内部切片,我一直在通过应用这个猴子补丁来做到这一点:
def __groupby_slice( _grp, start=0, stop=None, step=1):
'''
Applies a slice to a GroupBy object
'''
return _grp.apply( lambda _df : _df.iloc[start:stop:step]).reset_index(drop=True)
pd.core.groupby.GroupBy.slice = __groupby_slice
Use as:
用于:
df.groupby('feature0').slice(-10, -3, 2)
Works with pandas==0.25.3
与 pandas==0.25.3