pandas 从熊猫数据框中选择排序组的第一行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42181022/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selecting the first row of a sorted group from pandas data frame
提问by user1330974
Suppose, I have a dataframe in pandas like below:
假设,我在 Pandas 中有一个数据框,如下所示:
campaignname category_type amount
A cat_A_0 2.0
A cat_A_1 1.0
A cat_A_2 3.0
A cat_A_2 3.0
A cat_A_2 4.0
B cat_B_0 3.0
C cat_C_0 1.0
C cat_C_1 2.0
I am using the following code to group the above dataframe (say it's assigned variable name df
) by different columns as follows:
我正在使用以下代码df
按不同的列对上述数据框(假设它已分配变量名称)进行分组,如下所示:
for name, gp in df.groupby('campaignname'):
sorted_gp = gp.groupby(['campaignname', 'category_type']).sum().sort_values(['amount'], ascending=False)
# I'd like to know how to select this in a cleaner/more concise way
first_row = [sorted_gp.iloc[0].name[0], sorted_gp.iloc[0].name[1], sorted_gp.iloc[0].values.tolist()[0]]
The purpose of the above code is to first groupby
the raw data on campaignname
column, then in each of the resulting group, I'd like to group again by both campaignname
and category_type
, and finally, sort by amount
column to choose the first row that comes up (the one with the highest amount
in each group. Specifically for the above example, I'd like to get results like this:
上面代码的目的是首先列groupby
上的原始数据campaignname
,然后在每个结果组中,我想再次按campaignname
和分组category_type
,最后按amount
列排序以选择出现的第一行(amount
每个组中最高的一个。特别是对于上面的例子,我想得到这样的结果:
first_row = ['A', 'cat_A_2', 4.0] # for the first group
first_row = ['B', 'cat_B_0', 3.0] # for the second group
first_row = ['C', 'cat_C_1', 2.0] # for the third group
etc.
等等。
As you can see, I'm using a rather (in my opinion) 'ugly' way to retrieve the first row of each sorted group, but since I'm new to pandas, I don't know a better/cleaner way to accomplish this. If anyone could let me know a way to select the first row in a sorted group from a pandas dataframe, I'd greatly appreciate it. Thank you in advance for your answers/suggestions!
如您所见,我正在使用一种(在我看来)“丑陋”的方式来检索每个排序组的第一行,但是由于我是大Pandas的新手,我不知道更好/更清洁的方法做到这一点。如果有人能让我知道从 Pandas 数据框中选择排序组中第一行的方法,我将不胜感激。预先感谢您的回答/建议!
回答by MaxU
IIUC you can do it this way:
IIUC 你可以这样做:
In [83]: df.groupby('campaignname', as_index=False) \
.apply(lambda x: x.nlargest(1, columns=['amount'])) \
.reset_index(level=1, drop=1)
Out[83]:
campaignname category_type amount
0 A cat_A_2 4.0
1 B cat_B_0 3.0
2 C cat_C_1 2.0
or:
或者:
In [76]: df.sort_values('amount', ascending=False).groupby('campaignname').head(1)
Out[76]:
campaignname category_type amount
4 A cat_A_2 4.0
5 B cat_B_0 3.0
7 C cat_C_1 2.0
回答by piRSquared
My preferred way to do this is with idxmax
. It returns the index of the maximum value. I subsequently use that index to slice df
我的首选方法是使用idxmax
. 它返回最大值的索引。我随后使用该索引进行切片df
df.loc[df.groupby('campaignname').amount.idxmax()]
campaignname category_type amount
4 A cat_A_2 4.0
5 B cat_B_0 3.0
7 C cat_C_1 2.0