根据列中的最大值过滤 Pandas Dataframe
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25071937/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filter pandas Dataframe based on max values in a column
提问by wrcobb
I have a DataFrame with repeating values in the index. I would like to filter this dataset down to only show me one instance of each index by selecting the row within the index with the greatest value in a different column. For example, my DataFrame looks like this:
我有一个在索引中包含重复值的 DataFrame。我想通过选择索引中不同列中具有最大值的行来过滤此数据集,以仅显示每个索引的一个实例。例如,我的 DataFrame 如下所示:
df:
df:
Product ID Store Sales
1 A 50
1 B 200
1 C 20
2 A 400
2 B 10
3 A 200
4 A 50
4 B 100
4 C 500
I would like to filter this data down to this:
我想将此数据过滤为:
df2:
df2:
Product ID Store Sales
1 B 200
2 A 400
3 A 200
4 C 500
Any thoughts on how best to approach this issue in pandas?
关于如何最好地在大Pandas中解决这个问题的任何想法?
Thanks very much for your time -
非常感谢你花时间陪伴 -
回答by EdChum
You can perform a groupbyon 'Product ID', then apply idxmaxon 'Sales' column.
This will create a series with the index of the highest values.
We can then use the index values to index into the original dataframe using iloc
您可以groupby在“产品 ID”上执行,然后idxmax在“销售”列上应用。这将创建一个具有最高值索引的系列。然后我们可以使用索引值索引到原始数据帧中iloc
In [201]:
df.iloc[df.groupby('Product ID')['Sales'].agg(pd.Series.idxmax)]
Out[201]:
Product_ID Store Sales
1 1 B 200
3 2 A 400
5 3 A 200
8 4 C 500

