根据列中的最大值过滤 Pandas Dataframe

Question

提问by wrcobb

I have a DataFrame with repeating values in the index. I would like to filter this dataset down to only show me one instance of each index by selecting the row within the index with the greatest value in a different column. For example, my DataFrame looks like this:

我有一个在索引中包含重复值的 DataFrame。我想通过选择索引中不同列中具有最大值的行来过滤此数据集，以仅显示每个索引的一个实例。例如，我的 DataFrame 如下所示：

df:

df：

Product ID     Store     Sales
    1            A         50
    1            B        200
    1            C         20
    2            A        400
    2            B         10
    3            A        200
    4            A         50
    4            B        100
    4            C        500

I would like to filter this data down to this:

我想将此数据过滤为：

df2:

df2：

Product ID     Store     Sales
    1            B        200
    2            A        400
    3            A        200
    4            C        500

Any thoughts on how best to approach this issue in pandas?

关于如何最好地在大Pandas中解决这个问题的任何想法？

Thanks very much for your time -

非常感谢你花时间陪伴 -

Answer 1

回答by EdChum

You can perform a groupbyon 'Product ID', then apply idxmaxon 'Sales' column. This will create a series with the index of the highest values. We can then use the index values to index into the original dataframe using iloc

您可以groupby在“产品 ID”上执行，然后idxmax在“销售”列上应用。这将创建一个具有最高值索引的系列。然后我们可以使用索引值索引到原始数据帧中iloc

In [201]:

df.iloc[df.groupby('Product ID')['Sales'].agg(pd.Series.idxmax)]
Out[201]:
   Product_ID Store  Sales
1           1     B    200
3           2     A    400
5           3     A    200
8           4     C    500

根据列中的最大值过滤 Pandas Dataframe

提问by wrcobb

回答by EdChum

相关推荐

最近更新

标签

根据列中的最大值过滤 Pandas Dataframe

提问by wrcobb

回答by EdChum

相关推荐

在 Pandas 数据框中用 NaT 替换日期

pandas 导入熊猫导入错误：没有名为熊猫的模块

在 Pandas 中循环使用 MultiIndex

pandas 将日期时间列拆分为日期和时间 Python

相关推荐

最近更新

标签