pandas 熊猫数据框分组并获得第n行

Question

提问by Nilani Algiriyage

I have a pandas DataFrame like following.

我有一个如下所示的 Pandas DataFrame。

df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1'],['200','400','404','200','200','404','200','404','500','200','500','200','200','400']]).T

df.columns = ['col1','col2','col3','col4','ID','col5']

I want group this by "ID" and get the 2nd row of each group. Later I will need to get 3rd and 4th also. Just explain me how to get only the 2nd row of each group.

我想按“ID”对其进行分组并获得每组的第二行。稍后我还需要获得第三和第四。只是解释我如何只获得每组的第二行。

I tried following which gives both first and second.

我尝试了以下给出了第一个和第二个。

df.groupby('ID').head(2)

Instead I need to get only the second row. Since ID 4 and 6 has no second rows need to ignore them.

相反，我只需要获得第二行。由于 ID 4 和 6 没有第二行需要忽略它们。

             col1 col2 col3     col4     ID    col5
ID                                           
1       0   1.1     A  1.1    x/y/z       1    200
        11  1.1     D  4.7    x/y/z       1    200
2       3   2.6     B  2.6      x/u       2    200
        5   3.4     B  3.8    x/u/v       2    404
3       1   1.1     A  1.7      x/y       3    400
        2   1.1     A  2.5  x/y/z/n       3    404
4       4   2.5     B  3.3        x       4    200
5       6   2.6     B    4    x/y/z       5    200
        10  2.6     B  4.6      x/y       5    500
6       8   3.4     B  4.3  x/u/v/b       6    500

Answer 1

回答by Andy Hayden

I thinkthe nth method is supposed to do just that:

我认为第 n 种方法应该做到这一点：

In [10]: g = df.groupby('ID')
In [11]: g.nth(1).dropna()
Out[11]: 
    col1 col2  col3     col4 col5
ID                               
1    1.1    D   4.7    x/y/z  200
2    3.4    B   3.8    x/u/v  404
3    1.1    A   2.5  x/y/z/n  404
5    2.6    B   4.6      x/y  500

In 0.13 another way to do this is to use cumcount:

在 0.13 中，另一种方法是使用 cumcount：

df[g.cumcount() == n - 1]

...which is significantlyfaster.

...这是明显更快。

In [21]: %timeit g.nth(1).dropna()
100 loops, best of 3: 11.3 ms per loop

In [22]: %timeit df[g.cumcount() == 1]
1000 loops, best of 3: 286 μs per loop

Answer 2

回答by BrenBarn

If you use applyon the groupby, the function you pass is called on each group, passed as a DataFrame. So you can do:

如果apply在 groupby 上使用，则在每个组上调用您传递的函数，作为 DataFrame 传递。所以你可以这样做：

df.groupby('ID').apply(lambda t: t.iloc[1])

However, this will raise an error if the group doesn't have at least two rows. If you want to exclude groups with fewer than two rows, that could be trickier. I'm not aware of a way to exclude the result of applyonly for certain groups. You could try filtering the group list first by removing small groups, or return a one-row nan-filled DataFrame and do dropnaon the result.

但是，如果组没有至少两行，这将引发错误。如果您想排除少于两行的组，那可能会更棘手。我不知道有什么方法可以排除apply某些组的结果。您可以首先尝试通过删除小组来过滤组列表，或者返回一个nan单行填充的 DataFrame 并对dropna结果进行处理。

pandas 熊猫数据框分组并获得第n行

提问by Nilani Algiriyage

回答by Andy Hayden

回答by BrenBarn

相关推荐

最近更新

标签

pandas 熊猫数据框分组并获得第n行

提问by Nilani Algiriyage

回答by Andy Hayden

回答by BrenBarn

相关推荐

在 Pandas 中将 lambda 函数应用于列失败

pandas 如何使用熊猫替换列中的元素

将 Pandas DataFrame 转换为嵌套的 dict

在 Pandas 中是否有类似 GroupBy.get_group 的东西，但有一个可选的默认值？

相关推荐

最近更新

标签