pandas 熊猫数据框分组并获得第n行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20087713/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas dataframe groupby and get nth row
提问by Nilani Algiriyage
I have a pandas DataFrame like following.
我有一个如下所示的 Pandas DataFrame。
df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1'],['200','400','404','200','200','404','200','404','500','200','500','200','200','400']]).T
df.columns = ['col1','col2','col3','col4','ID','col5']
I want group this by "ID" and get the 2nd row of each group. Later I will need to get 3rd and 4th also. Just explain me how to get only the 2nd row of each group.
我想按“ID”对其进行分组并获得每组的第二行。稍后我还需要获得第三和第四。只是解释我如何只获得每组的第二行。
I tried following which gives both first and second.
我尝试了以下给出了第一个和第二个。
df.groupby('ID').head(2)
Instead I need to get only the second row. Since ID 4 and 6 has no second rows need to ignore them.
相反,我只需要获得第二行。由于 ID 4 和 6 没有第二行需要忽略它们。
col1 col2 col3 col4 ID col5
ID
1 0 1.1 A 1.1 x/y/z 1 200
11 1.1 D 4.7 x/y/z 1 200
2 3 2.6 B 2.6 x/u 2 200
5 3.4 B 3.8 x/u/v 2 404
3 1 1.1 A 1.7 x/y 3 400
2 1.1 A 2.5 x/y/z/n 3 404
4 4 2.5 B 3.3 x 4 200
5 6 2.6 B 4 x/y/z 5 200
10 2.6 B 4.6 x/y 5 500
6 8 3.4 B 4.3 x/u/v/b 6 500
回答by Andy Hayden
I thinkthe nth method is supposed to do just that:
我认为第 n 种方法应该做到这一点:
In [10]: g = df.groupby('ID')
In [11]: g.nth(1).dropna()
Out[11]:
col1 col2 col3 col4 col5
ID
1 1.1 D 4.7 x/y/z 200
2 3.4 B 3.8 x/u/v 404
3 1.1 A 2.5 x/y/z/n 404
5 2.6 B 4.6 x/y 500
In 0.13 another way to do this is to use cumcount:
在 0.13 中,另一种方法是使用 cumcount:
df[g.cumcount() == n - 1]
...which is significantlyfaster.
...这是明显更快。
In [21]: %timeit g.nth(1).dropna()
100 loops, best of 3: 11.3 ms per loop
In [22]: %timeit df[g.cumcount() == 1]
1000 loops, best of 3: 286 μs per loop
回答by BrenBarn
If you use applyon the groupby, the function you pass is called on each group, passed as a DataFrame. So you can do:
如果apply在 groupby 上使用,则在每个组上调用您传递的函数,作为 DataFrame 传递。所以你可以这样做:
df.groupby('ID').apply(lambda t: t.iloc[1])
However, this will raise an error if the group doesn't have at least two rows. If you want to exclude groups with fewer than two rows, that could be trickier. I'm not aware of a way to exclude the result of applyonly for certain groups. You could try filtering the group list first by removing small groups, or return a one-row nan-filled DataFrame and do dropnaon the result.
但是,如果组没有至少两行,这将引发错误。如果您想排除少于两行的组,那可能会更棘手。我不知道有什么方法可以排除apply某些组的结果。您可以首先尝试通过删除小组来过滤组列表,或者返回一个nan单行填充的 DataFrame 并对dropna结果进行处理。

