Pandas 在一列上分组，另一列 python 上的最大日期

Question

提问by Anurag Rawat

i have a dataframe with following data :

我有一个包含以下数据的数据框：

invoice_no  dealer  billing_change_previous_month        date
       110       1                              0  2016-12-31
       100       1                         -41981  2017-01-30
      5505       2                              0  2017-01-30
      5635       2                          58730  2016-12-31

i want to have only one dealer with the maximum date . The desired output should be like this :

我只想有一个最大日期的经销商。所需的输出应该是这样的：

invoice_no  dealer  billing_change_previous_month        date
       100       1                         -41981  2017-01-30
      5505       2                              0  2017-01-30

each dealer should be distinct with maximum date, thanks in advance for your help.

每个经销商应与最大日期不同，在此先感谢您的帮助。

Answer 1

采纳答案by Vaishali

You can use boolean indexing using groupby and transform

您可以使用 groupby 和转换来使用布尔索引

df_new = df[df.groupby('dealer').date.transform('max') == df['date']]

    invoice_no  dealer  billing_change_previous_month   date
1   100         1       -41981                          2017-01-30
2   5505        2       0                               2017-01-30

If there are more than two dealers,

如果有两个以上的经销商，

df = pd.DataFrame({'invoice_no':[110,100,5505,5635,10000,10001], 'dealer':[1,1,2,2,3,3],'billing_change_previous_month':[0,-41981,0,58730,9000,100], 'date':['2016-12-31','2017-01-30','2017-01-30','2016-12-31', '2019-12-31', '2020-01-31']})

df['date'] = pd.to_datetime(df['date'])
df[df.groupby('dealer').date.transform('max') == df['date']]


    invoice_no  dealer  billing_change_previous_month   date
1   100         1       -41981                          2017-01-30
2   5505        2       0                               2017-01-30
5   10001       3       100                             2020-01-31

Answer 2

回答by 3novak

Tack 1

大头针 1

Sort by dealer and by date before using drop_duplicates. This is blind to the issue that surfaces in Tack 2, below since there is no possibility for multiple records for each dealer in this method. This may or may not be an issue for you depending on your data and your use case.

在使用drop_duplicates之前按经销商和日期排序。这对下面 Tack 2 中出现的问题视而不见，因为在这种方法中每个经销商不可能有多个记录。根据您的数据和用例，这对您来说可能是也可能不是问题。

df.sort_values(['dealer', 'date'], inplace=True)
df.drop_duplicates(['dealer', 'date'], inplace=True)

Tack 2

大头针 2

This is a worse way to do it with a groupbyand a merge. Use groupbyto find the max date for each dealer. We use the how='inner'parameter to only include those dealer and date combinations that appear in the groupby object that contains the maximum date for each dealer.

这是用groupby和merge来做的更糟糕的方法。使用groupby查找每个经销商的最大日期。我们使用该how='inner'参数仅包含出现在 groupby 对象中的那些经销商和日期组合，该对象包含每个经销商的最大日期。

However, please note that this will return multiple records per dealer if the max date is duplicated in the original table. You might need to use drop_duplicatesdepending on your data and your use case.

但是，请注意，如果原始表中的最大日期重复，这将返回每个经销商的多条记录。根据您的数据和用例，您可能需要使用drop_duplicates。

df.merge(df.groupby('dealer')['date'].max().reset_index(), 
                             on=['dealer', 'date'], how='inner')

   invoice_no  dealer  billing_change_previous_month        date
0         100       1                         -41981  2017-01-30
1        5505       2                              0  2017-01-30

Answer 3

回答by Rufat

Here https://stackoverflow.com/a/41531127/9913319is more correct solution:

这里https://stackoverflow.com/a/41531127/9913319是更正确的解决方案：

df.sort_values('date').groupby('dealer').tail(1)

Pandas 在一列上分组，另一列 python 上的最大日期

提问by Anurag Rawat

采纳答案by Vaishali

回答by 3novak

回答by Rufat

相关推荐

最近更新

标签

Pandas 在一列上分组，另一列 python 上的最大日期

提问by Anurag Rawat

采纳答案by Vaishali

回答by 3novak

回答by Rufat

相关推荐

pandas 熊猫数据框列表理解中的 If ElseIf Else 条件

两个 Pandas 数据框中的公共列列表

pandas 将索引号转换为 int (Python)

Pandas 过滤多个子串串联

相关推荐

最近更新

标签