pandas 获取 groupby 中的第一个和最后一个值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38797271/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
get first and last values in a groupby
提问by Brian
I have a dataframe df
我有一个数据框 df
df = pd.DataFrame(np.arange(20).reshape(10, -1),
[['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'],
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']],
['X', 'Y'])
How do I get the first and last rows, grouped by the first level of the index?
如何获取按索引的第一级分组的第一行和最后一行?
I tried
我试过
df.groupby(level=0).agg(['first', 'last']).stack()
and got
并得到
X Y
a first 0 1
last 6 7
b first 8 9
last 12 13
c first 14 15
last 16 17
d first 18 19
last 18 19
This is so close to what I want. How can I preserve the level 1 index and get this instead:
这与我想要的非常接近。我怎样才能保留 1 级索引并得到这个:
X Y
a a 0 1
d 6 7
b e 8 9
g 12 13
c h 14 15
i 16 17
d j 18 19
j 18 19
采纳答案by piRSquared
Option 1
选项1
def first_last(df):
return df.ix[[0, -1]]
df.groupby(level=0, group_keys=False).apply(first_last)
Option 2 - only works if index is unique
选项 2 - 仅在索引唯一时才有效
idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack()
df.loc[idx]
Option 3 - per notes below, this only makes sense when there are no NAs
选项 3 - 根据下面的注释,这仅在没有 NA 时才有意义
I also abused the agg
function. The code below works, but is far uglier.
我也滥用了这个agg
功能。下面的代码有效,但要丑得多。
df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
.set_index('level_1', append=True).reset_index(1, drop=True) \
.rename_axis([None, None])
Note
笔记
per @unutbu: agg(['first', 'last'])
take the firs non-na values.
每个@unutbu:agg(['first', 'last'])
采用第一个非 na 值。
I interpreted this as, it must then be necessary to run this column by column. Further, forcing index level=1 to align may not even make sense.
我将此解释为,必须逐列运行此列。此外,强制索引 level=1 对齐甚至可能没有意义。
Let's include another test
让我们包括另一个测试
df = pd.DataFrame(np.arange(20).reshape(10, -1),
[list('aaaabbbccd'),
list('abcdefghij')],
list('XY'))
df.loc[tuple('aa'), 'X'] = np.nan
def first_last(df):
return df.ix[[0, -1]]
df.groupby(level=0, group_keys=False).apply(first_last)
df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
.set_index('level_1', append=True).reset_index(1, drop=True) \
.rename_axis([None, None])
Sure enough! This second solution is taking the first valid value in column X. It is now nonsensical to have forced that value to align with the index a.
果然!第二个解决方案是取第 X 列中的第一个有效值。现在强制该值与索引 a 对齐是荒谬的。
回答by Akarsh Jain
This could be on of the easy solution.
这可能是一个简单的解决方案。
df.groupby(level = 0, as_index= False).nth([0,-1])
X Y
a a 0 1
d 6 7
b e 8 9
g 12 13
c h 14 15
i 16 17
d j 18 19
Hope this helps. (Y)
希望这可以帮助。(是)
回答by nat23dip
Please try this:
请试试这个:
For last value: df.groupby('Column_name').nth(-1)
,
对于最后一个值:df.groupby('Column_name').nth(-1)
,
For first value: df.groupby('Column_name').nth(0)
对于第一个值: df.groupby('Column_name').nth(0)