Python pandas 按年份分组，按销售列排名，在具有重复数据的数据框中

Question

提问by Ben

I would like to create a rank on year (so in year 2012, Manager B is 1. In 2011, Manager B is 1 again). I struggled with the pandas rank function for awhile and DO NOT want to resort to a for loop.

我想在年份上创建一个排名（所以在 2012 年，经理 B 为 1。2011 年，经理 B 再次为 1）。我在 Pandas rank 函数上挣扎了一段时间，不想求助于 for 循环。

s = pd.DataFrame([['2012','A',3],['2012','B',8],['2011','A',20],['2011','B',30]], columns=['Year','Manager','Return'])

Out[1]:     
   Year Manager  Return    
0  2012       A       3    
1  2012       B       8    
2  2011       A      20    
3  2011       B      30

The issue I'm having is with the additional code (didn't think this would be relevant before):

我遇到的问题是附加代码（以前认为这无关紧要）：

s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])

s = s.append(b)
s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False)

raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects

Any ideas?
This is the real data structure I am using. Been having trouble re-indexing..

有任何想法吗？
这是我正在使用的真实数据结构。重新索引时遇到问题..

Answer 1

回答by unutbu

It sounds like you want to group by the Year, then rank the Returnsin descending order.

听起来您想按分组Year，然后Returns按降序排列。

import pandas as pd
s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]],
                 columns=['Year', 'Manager', 'Return'])
s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False)
print(s)

yields

产量

   Year Manager  Return  Rank
0  2012       A       3     2
1  2012       B       8     1
2  2011       A      20     2
3  2011       B      30     1

To address the OP's revised question: The error message

解决 OP 的修订问题：错误消息

ValueError: cannot reindex from a duplicate axis

occurs when trying to groupby/rankon a DataFrame with duplicate values in the index. You can avoid the problem by constructing sto have unique index values after appending:

尝试groupby/rank在索引中具有重复值的 DataFrame 上时发生。您可以通过s在附加后构造具有唯一索引值来避免该问题：

s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
s = s.append(b, ignore_index=True)

yields

产量

   Year Manager  Return
0  2012       A       3
1  2012       B       8
2  2011       A      20
3  2011       B      30
4  2012       A       3
5  2012       B       8
6  2011       A      20
7  2011       B      30

If you've already appended new rows using

如果您已经使用附加新行

s = s.append(b)

then use reset_indexto create a unique index:

然后使用reset_index创建唯一索引：

s = s.reset_index(drop=True)

Python pandas 按年份分组，按销售列排名，在具有重复数据的数据框中

提问by Ben

回答by unutbu

相关推荐

最近更新

标签

Python pandas 按年份分组，按销售列排名，在具有重复数据的数据框中

提问by Ben

回答by unutbu

相关推荐

将“yield from”语句转换为 Python 2.7 代码

Python 如何在熊猫的多个数据框列中“选择不同”？

Python OpenCV cv2.fillPoly 与 cv2.fillConvexPoly：多边形顶点数组的预期数据类型？

Python 浮点数必须是字符串还是数字？

相关推荐

最近更新

标签