pandas 从熊猫数据框中选择排序组的第一行

Question

提问by user1330974

Suppose, I have a dataframe in pandas like below:

假设，我在 Pandas 中有一个数据框，如下所示：

campaignname    category_type    amount
A               cat_A_0            2.0
A               cat_A_1            1.0
A               cat_A_2            3.0
A               cat_A_2            3.0
A               cat_A_2            4.0
B               cat_B_0            3.0
C               cat_C_0            1.0
C               cat_C_1            2.0

I am using the following code to group the above dataframe (say it's assigned variable name df) by different columns as follows:

我正在使用以下代码df按不同的列对上述数据框（假设它已分配变量名称）进行分组，如下所示：

for name, gp in df.groupby('campaignname'):
    sorted_gp = gp.groupby(['campaignname', 'category_type']).sum().sort_values(['amount'], ascending=False)
    # I'd like to know how to select this in a cleaner/more concise way
    first_row = [sorted_gp.iloc[0].name[0], sorted_gp.iloc[0].name[1], sorted_gp.iloc[0].values.tolist()[0]]

The purpose of the above code is to first groupbythe raw data on campaignnamecolumn, then in each of the resulting group, I'd like to group again by both campaignnameand category_type, and finally, sort by amountcolumn to choose the first row that comes up (the one with the highest amountin each group. Specifically for the above example, I'd like to get results like this:

上面代码的目的是首先列groupby上的原始数据campaignname，然后在每个结果组中，我想再次按campaignname和分组category_type，最后按amount列排序以选择出现的第一行（amount每个组中最高的一个。特别是对于上面的例子，我想得到这样的结果：

first_row = ['A', 'cat_A_2', 4.0] # for the first group
first_row = ['B', 'cat_B_0', 3.0] # for the second group
first_row = ['C', 'cat_C_1', 2.0] # for the third group

etc.

等等。

As you can see, I'm using a rather (in my opinion) 'ugly' way to retrieve the first row of each sorted group, but since I'm new to pandas, I don't know a better/cleaner way to accomplish this. If anyone could let me know a way to select the first row in a sorted group from a pandas dataframe, I'd greatly appreciate it. Thank you in advance for your answers/suggestions!

如您所见，我正在使用一种（在我看来）“丑陋”的方式来检索每个排序组的第一行，但是由于我是大Pandas的新手，我不知道更好/更清洁的方法做到这一点。如果有人能让我知道从 Pandas 数据框中选择排序组中第一行的方法，我将不胜感激。预先感谢您的回答/建议！

Answer 1

回答by MaxU

IIUC you can do it this way:

IIUC 你可以这样做：

In [83]: df.groupby('campaignname', as_index=False) \
           .apply(lambda x: x.nlargest(1, columns=['amount'])) \
           .reset_index(level=1, drop=1)
Out[83]:
  campaignname category_type  amount
0            A       cat_A_2     4.0
1            B       cat_B_0     3.0
2            C       cat_C_1     2.0

or:

或者：

In [76]: df.sort_values('amount', ascending=False).groupby('campaignname').head(1)
Out[76]:
  campaignname category_type  amount
4            A       cat_A_2     4.0
5            B       cat_B_0     3.0
7            C       cat_C_1     2.0

Answer 2

回答by piRSquared

My preferred way to do this is with idxmax. It returns the index of the maximum value. I subsequently use that index to slice df

我的首选方法是使用idxmax. 它返回最大值的索引。我随后使用该索引进行切片df

df.loc[df.groupby('campaignname').amount.idxmax()]

  campaignname category_type  amount
4            A       cat_A_2     4.0
5            B       cat_B_0     3.0
7            C       cat_C_1     2.0

pandas 从熊猫数据框中选择排序组的第一行

提问by user1330974

回答by MaxU

回答by piRSquared

相关推荐

最近更新

标签

pandas 从熊猫数据框中选择排序组的第一行

提问by user1330974

回答by MaxU

回答by piRSquared

相关推荐

将包含列表的 Pandas 列“unstack”成多行

pandas 如何加载excel表并清理python中的数据？

使用 groupby 划分两列的 Pandas

截断表不适用于 SQL Server sqlalchemy 引擎和 Pandas

相关推荐

最近更新

标签