如何删除 Pandas 系列重复索引的额外副本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14395678/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:35:49  来源:igfitidea点击:

How to drop extra copy of duplicate index of Pandas Series?

pythonpandas

提问by bigbug

I have a Series swith duplicate index :

我有一个s带有重复索引的系列:

>>> s
STK_ID  RPT_Date
600809  20061231    demo_str
        20070331    demo_str
        20070630    demo_str
        20070930    demo_str
        20071231    demo_str
        20060331    demo_str
        20060630    demo_str
        20060930    demo_str
        20061231    demo_str
        20070331    demo_str
        20070630    demo_str
Name: STK_Name, Length: 11

And I just want to keep the unique rows and only one copy of the duplicate rows by:

我只想通过以下方式保留唯一行和重复行的一个副本:

s[s.index.unique()]

Pandas 0.10.1.dev-f7f7e13give the below error msg

Pandas 0.10.1.dev-f7f7e13给出以下错误消息

>>> s[s.index.unique()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\Python27\lib\site-packages\pandas\core\series.py", line 515, in __getitem__
    return self._get_with(key)
  File "d:\Python27\lib\site-packages\pandas\core\series.py", line 558, in _get_with
    return self.reindex(key)
  File "d:\Python27\lib\site-packages\pandas\core\series.py", line 2361, in reindex
    level=level, limit=limit)
  File "d:\Python27\lib\site-packages\pandas\core\index.py", line 2063, in reindex
    limit=limit)
  File "d:\Python27\lib\site-packages\pandas\core\index.py", line 2021, in get_indexer
    raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects
>>> 

So how to drop extra duplicate rows of series, keep the unique rows and only one copy of the duplicate rows in an efficient way ? (better in one line)

那么如何以有效的方式删除额外的系列重复行,保留唯一行和重复行的一个副本?(最好在一行中)

回答by Zelazny7

You can groupby the index and apply a function that returns one value per index group. Here, I take the first value:

您可以按索引分组并应用一个函数,该函数为每个索引组返回一个值。在这里,我取第一个值:

In [1]: s = Series(range(10), index=[1,2,2,2,5,6,7,7,7,8])

In [2]: s
Out[2]:
1    0
2    1
2    2
2    3
5    4
6    5
7    6
7    7
7    8
8    9

In [3]: s.groupby(s.index).first()
Out[3]:
1    0
2    1
5    4
6    5
7    6
8    9

UPDATE

更新

Addressing BigBug's comment about crashing when passing a MultiIndex to Series.groupby():

解决 BigBug 关于将 MultiIndex 传递给 Series.groupby() 时崩溃的评论:

In [1]: s
Out[1]:
STK_ID  RPT_Date
600809  20061231    demo
        20070331    demo
        20070630    demo
        20070331    demo

In [2]: s.reset_index().groupby(s.index.names).first()
Out[2]:
                    0
STK_ID RPT_Date
600809 20061231  demo
       20070331  demo
       20070630  demo

回答by Anton Protopopov

You could subset your data with duplicated(which keeps first value by default) for index. With @Zelazny7 example:

您可以使用duplicated(默认情况下保留第一个值)为您的数据子集index。以@Zelazny7 为例:

s = pd.Series(range(10), index=[1,2,2,2,5,6,7,7,7,8])

In [130]: s[~s.index.duplicated()]
Out[130]: 
1    0
2    1
5    4
6    5
7    6
8    9
dtype: int64

回答by bmu

One way would be using dropand index.get_duplicates:

一种方法是使用dropand index.get_duplicates

In [43]: df
Out[43]: 
                      String
STK_ID RPT_Date             
600809 20061231  demo_string
       20070331  demo_string
       20070630  demo_string
       20070930  demo_string
       20071231  demo_string
       20060331  demo_string
       20060630  demo_string
       20060930  demo_string
       20061231  demo_string
       20070331  demo_string
       20070630  demo_string

In [44]: df.drop(df.index.get_duplicates())
Out[44]: 
                      String
STK_ID RPT_Date             
600809 20070930  demo_string
       20071231  demo_string
       20060331  demo_string
       20060630  demo_string
       20060930  demo_string