Python Pandas,从 .groupby().apply() 中的组切片行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36070288/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:53:26  来源:igfitidea点击:

Python Pandas, slice rows from group in .groupby().apply()

pythonpandasgroup-bydataframeslice

提问by jfive

I have the following code setup that calls and groupBy and apply on a Python Pandas DataFrame.

我有以下代码设置调用和 groupBy 并应用于 Python Pandas DataFrame。

The bizarre thing is I am unable to slice the grouped data by row (like df.loc[2:5]) without it completely screwing the output (as shown in the debug), how can you drop rows and get this to give the desired output?

奇怪的是,我无法按行(如df.loc[2:5])对分组数据进行切片而不完全破坏输出(如调试中所示),如何删除行并获得所需的输出?

Any help would be massively appreciated, I'm running this on a bigger example with more complicated functions, but have pinpointed the issues to the row slicing!

任何帮助将不胜感激,我正在一个更大的例子上运行它,它具有更复杂的功能,但已将问题定位到行切片!

Code:

代码:

import pandas as pd
df = pd.DataFrame({'one' : ['AAL', 'AAL', 'AAPL', 'AAPL'], 'two' : [1, 2, 3, 4]})

def net_func(df):
    df_res = daily_func(df, True)
    df_res_valid = daily_func(df, False)
    df_merge = pd.merge(df_res, df_res_valid)
    return df_merge

def daily_func(df, bool_param):

#     df.drop(df.head(1).index, inplace=True)
#     df = df[1:1]
#     df.iloc[1:1,:]
#     df.loc[1:1,:]


    if bool_param:
        df['daily'+str(bool_param)] = 1
    else:
        df['daily'+str(bool_param)] = 0    
    return df

print df.groupby('one').apply(net_func)

Current output:

电流输出:

         one  two  dailyTrue  dailyFalse
one                                     
AAL  0   AAL    1          1           0
     1   AAL    2          1           0
AAPL 0  AAPL    1          1           0
     1  AAPL    2          1           0

Desired output:

期望的输出:

         one  two  dailyTrue  dailyFalse
one                                     
AAL  1   AAL    2          1           0
AAPL 1  AAPL    2          1           0

Ideally, I would like to be able to slice by row for each group for example df.loc[3:5]- This would be perfect!

理想情况下,我希望能够为每个组逐行切片df.loc[3:5]- 这将是完美的!

I've tried the commented as follows:

我试过评论如下:

output with df.drop(df.head(1).index, inplace=True):

输出df.drop(df.head(1).index, inplace=True)

Empty DataFrame
Columns: [one, two, dailyTrue, dailyFalse]
Index: []

Update: also tried output with df = df[1:1]:

更新:还尝试输出df = df[1:1]

Empty DataFrame
Columns: [one, two, dailyTrue, dailyFalse]
Index: []

Update have also tried df.iloc[1:1,:]:

更新也尝试过df.iloc[1:1,:]

         one  two  dailyTrue  dailyFalse
one                                     
AAL  0   AAL    1          1           0
     1   AAL    2          1           0
AAPL 0  AAPL    1          1           0
     1  AAPL    2          1           0

and df.loc[1:1,:]:

df.loc[1:1,:]

         one  two  dailyTrue  dailyFalse
one                                     
AAL  0   AAL    1          1           0
     1   AAL    2          1           0
AAPL 0  AAPL    1          1           0
     1  AAPL    2          1           0

采纳答案by Parfait

Consider using the cross section slice, xsafter the groupby().apply(), specifying each key accordingly:

考虑在xs之后使用横截面切片,相应地groupby().apply()指定每个键:

print df.groupby('one').apply(net_func).xs(0, level=1)
#       one  two  dailyTrue  dailyFalse
#one                                   
#AAL    AAL    1          1           0
#AAPL  AAPL    1          1           0

print df.groupby('one').apply(net_func).xs(1, level=1)
#       one  two  dailyTrue  dailyFalse
#one                                   
#AAL    AAL    2          1           0
#AAPL  AAPL    2          1           0

Alternatively, use multiple indexingwith list of tuples:

或者,对元组列表使用多重索引

print df.groupby('one').apply(net_func).ix[[('AAL', 1), ('AAPL', 1)]]
#         one  two  dailyTrue  dailyFalse
#one                                     
#AAL  1   AAL    2          1           0
#AAPL 1  AAPL    2          1           0

Still more with slice (introduced in pandas 0.14):

还有更多切片(在Pandas 0.14 中引入):

print df.groupby('one').apply(net_func).loc[(slice('AAL','AAPL'),slice(1,1)),:]
#         one  two  dailyTrue  dailyFalse
#one                                     
#AAL  1   AAL    2          1           0
#AAPL 1  AAPL    2          1           0

回答by Learning is a mess

I felt the need for slicing inside GroupByobject and I have been doing so by applying this monkey patch:

我觉得需要在GroupBy对象内部切片,我一直在通过应用这个猴子补丁来做到这一点:

def __groupby_slice( _grp, start=0, stop=None, step=1):
    '''
    Applies a slice to a GroupBy object
    '''
    return _grp.apply( lambda _df : _df.iloc[start:stop:step]).reset_index(drop=True)

pd.core.groupby.GroupBy.slice = __groupby_slice

Use as:

用于:

df.groupby('feature0').slice(-10, -3, 2)

Works with pandas==0.25.3

pandas==0.25.3