Python pandas 按年份分组,按销售列排名,在具有重复数据的数据框中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17604665/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas group by year, rank by sales column, in a dataframe with duplicate data
提问by Ben
I would like to create a rank on year (so in year 2012, Manager B is 1. In 2011, Manager B is 1 again). I struggled with the pandas rank function for awhile and DO NOT want to resort to a for loop.
我想在年份上创建一个排名(所以在 2012 年,经理 B 为 1。2011 年,经理 B 再次为 1)。我在 Pandas rank 函数上挣扎了一段时间,不想求助于 for 循环。
s = pd.DataFrame([['2012','A',3],['2012','B',8],['2011','A',20],['2011','B',30]], columns=['Year','Manager','Return'])
Out[1]:
Year Manager Return
0 2012 A 3
1 2012 B 8
2 2011 A 20
3 2011 B 30
The issue I'm having is with the additional code (didn't think this would be relevant before):
我遇到的问题是附加代码(以前认为这无关紧要):
s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
s = s.append(b)
s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False)
raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects
Any ideas?
This is the real data structure I am using.
Been having trouble re-indexing..
有任何想法吗?
这是我正在使用的真实数据结构。重新索引时遇到问题..
回答by unutbu
It sounds like you want to group by the Year
, then rank the Returns
in descending order.
听起来您想按 分组Year
,然后Returns
按降序排列 。
import pandas as pd
s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]],
columns=['Year', 'Manager', 'Return'])
s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False)
print(s)
yields
产量
Year Manager Return Rank
0 2012 A 3 2
1 2012 B 8 1
2 2011 A 20 2
3 2011 B 30 1
To address the OP's revised question: The error message
解决 OP 的修订问题:错误消息
ValueError: cannot reindex from a duplicate axis
occurs when trying to groupby/rank
on a DataFrame with duplicate values in the index. You can avoid the problem by constructing s
to have unique index values after appending:
尝试groupby/rank
在索引中具有重复值的 DataFrame 上时发生。您可以通过s
在附加后构造具有唯一索引值来避免该问题:
s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
s = s.append(b, ignore_index=True)
yields
产量
Year Manager Return
0 2012 A 3
1 2012 B 8
2 2011 A 20
3 2011 B 30
4 2012 A 3
5 2012 B 8
6 2011 A 20
7 2011 B 30
If you've already appended new rows using
如果您已经使用附加新行
s = s.append(b)
then use reset_index
to create a unique index:
然后使用reset_index
创建唯一索引:
s = s.reset_index(drop=True)