Python Pandas 数据框获取每组的第一行

Question

提问by Nilani Algiriyage

I have a pandas DataFramelike following.

我有一个DataFrame喜欢以下的熊猫。

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
                'value'  : ["first","second","second","first",
                            "second","first","third","fourth",
                            "fifth","second","fifth","first",
                            "first","second","third","fourth","fifth"]})

I want to group this by ["id","value"] and get the first row of each group.

我想按 ["id","value"] 对其进行分组并获取每个组的第一行。

        id   value
0        1   first
1        1  second
2        1  second
3        2   first
4        2  second
5        3   first
6        3   third
7        3  fourth
8        3   fifth
9        4  second
10       4   fifth
11       5   first
12       6   first
13       6  second
14       6   third
15       7  fourth
16       7   fifth

Expected outcome

预期结果

    id   value
     1   first
     2   first
     3   first
     4  second
     5  first
     6  first
     7  fourth

I tried following which only gives the first row of the DataFrame. Any help regarding this is appreciated.

我尝试了以下仅给出DataFrame. 对此的任何帮助表示赞赏。

In [25]: for index, row in df.iterrows():
   ....:     df2 = pd.DataFrame(df.groupby(['id','value']).reset_index().ix[0])

Answer 1

采纳答案by Roman Pekar

>>> df.groupby('id').first()
     value
id        
1    first
2    first
3    first
4   second
5    first
6    first
7   fourth

If you need idas column:

如果您需要id作为列：

>>> df.groupby('id').first().reset_index()
   id   value
0   1   first
1   2   first
2   3   first
3   4  second
4   5   first
5   6   first
6   7  fourth

To get n first records, you can use head():

要获取第 n 个记录，您可以使用 head()：

>>> df.groupby('id').head(2).reset_index(drop=True)
    id   value
0    1   first
1    1  second
2    2   first
3    2  second
4    3   first
5    3   third
6    4  second
7    4   fifth
8    5   first
9    6   first
10   6  second
11   7  fourth
12   7   fifth

Answer 2

回答by wij

This will give you the second row of each group (zero indexed, nth(0) is the same as first()):

这将为您提供每组的第二行（零索引，nth(0) 与 first() 相同）：

df.groupby('id').nth(1)

Documentation: http://pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group

文档：http: //pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group

Answer 3

回答by Siraj S.

maybe this is what you want

也许这就是你想要的

import pandas as pd
idx = pd.MultiIndex.from_product([['state1','state2'],   ['county1','county2','county3','county4']])
df = pd.DataFrame({'pop': [12,15,65,42,78,67,55,31]}, index=idx)

                pop
state1 county1   12
       county2   15
       county3   65
       county4   42
state2 county1   78
       county2   67
       county3   55
       county4   31

                pop
state1 county1   12
       county2   15
       county3   65
       county4   42
state2 county1   78
       county2   67
       county3   55
       county4   31

df.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values('pop', ascending=False)).groupby(level=0).head(3)

> Out[29]: 
                pop
state1 county3   65
       county4   42
       county2   15
state2 county1   78
       county2   67
       county3   55

Answer 4

回答by vital_dml

I'd suggest to use .nth(0)rather than .first()if you need to get the first row.

如果您需要获得第一行，我建议使用.nth(0)而不是.first()。

The difference between them is how they handle NaNs, so .nth(0)will return the first row of group no matter what are the values in this row, while .first()will eventually return the first notNaNvalue in each column.

它们之间的区别在于它们如何处理 NaN，因此.nth(0)无论该行中的值是什么，都将返回 group 的第一行，而.first()最终将返回每列中的第一个notNaN值。

E.g. if your dataset is :

例如，如果您的数据集是：

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4],
            'value'  : ["first","second","third", np.NaN,
                        "second","first","second","third",
                        "fourth","first","second"]})

>>> df.groupby('id').nth(0)
    value
id        
1    first
2    NaN
3    first
4    first

And

和

>>> df.groupby('id').first()
    value
id        
1    first
2    second
3    first
4    first

Answer 5

回答by YOBEN_S

If you only need the first row from each group we can do with drop_duplicates, Notice the function default method keep='first'.

如果您只需要我们可以使用的每个组的第一行drop_duplicates，请注意函数默认方法keep='first'。

df.drop_duplicates('id')
Out[1027]: 
    id   value
0    1   first
3    2   first
5    3   first
9    4  second
11   5   first
12   6   first
15   7  fourth

Python Pandas 数据框获取每组的第一行

提问by Nilani Algiriyage

采纳答案by Roman Pekar

回答by wij

回答by Siraj S.

回答by vital_dml

回答by YOBEN_S

相关推荐

最近更新

标签

Python Pandas 数据框获取每组的第一行

提问by Nilani Algiriyage

采纳答案by Roman Pekar

回答by wij

回答by Siraj S.

回答by vital_dml

回答by YOBEN_S

相关推荐

Python 在 DataFrame 索引上应用函数

Python 使用 Pandas 为字符串列中的每个值添加字符串前缀

Python：for循环 - 在同一行打印

Python 没有名为 flask.ext.wtf 的模块

相关推荐

最近更新

标签