如何删除 Pandas 系列重复索引的额外副本？

Question

提问by bigbug

I have a Series swith duplicate index :

我有一个s带有重复索引的系列：

>>> s
STK_ID  RPT_Date
600809  20061231    demo_str
        20070331    demo_str
        20070630    demo_str
        20070930    demo_str
        20071231    demo_str
        20060331    demo_str
        20060630    demo_str
        20060930    demo_str
        20061231    demo_str
        20070331    demo_str
        20070630    demo_str
Name: STK_Name, Length: 11

And I just want to keep the unique rows and only one copy of the duplicate rows by:

我只想通过以下方式保留唯一行和重复行的一个副本：

s[s.index.unique()]

Pandas 0.10.1.dev-f7f7e13give the below error msg

Pandas 0.10.1.dev-f7f7e13给出以下错误消息

>>> s[s.index.unique()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\Python27\lib\site-packages\pandas\core\series.py", line 515, in __getitem__
    return self._get_with(key)
  File "d:\Python27\lib\site-packages\pandas\core\series.py", line 558, in _get_with
    return self.reindex(key)
  File "d:\Python27\lib\site-packages\pandas\core\series.py", line 2361, in reindex
    level=level, limit=limit)
  File "d:\Python27\lib\site-packages\pandas\core\index.py", line 2063, in reindex
    limit=limit)
  File "d:\Python27\lib\site-packages\pandas\core\index.py", line 2021, in get_indexer
    raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects
>>>

So how to drop extra duplicate rows of series, keep the unique rows and only one copy of the duplicate rows in an efficient way ? (better in one line)

那么如何以有效的方式删除额外的系列重复行，保留唯一行和重复行的一个副本？（最好在一行中）

Answer 1

回答by Zelazny7

You can groupby the index and apply a function that returns one value per index group. Here, I take the first value:

您可以按索引分组并应用一个函数，该函数为每个索引组返回一个值。在这里，我取第一个值：

In [1]: s = Series(range(10), index=[1,2,2,2,5,6,7,7,7,8])

In [2]: s
Out[2]:
1    0
2    1
2    2
2    3
5    4
6    5
7    6
7    7
7    8
8    9

In [3]: s.groupby(s.index).first()
Out[3]:
1    0
2    1
5    4
6    5
7    6
8    9

UPDATE

更新

Addressing BigBug's comment about crashing when passing a MultiIndex to Series.groupby():

解决 BigBug 关于将 MultiIndex 传递给 Series.groupby() 时崩溃的评论：

In [1]: s
Out[1]:
STK_ID  RPT_Date
600809  20061231    demo
        20070331    demo
        20070630    demo
        20070331    demo

In [2]: s.reset_index().groupby(s.index.names).first()
Out[2]:
                    0
STK_ID RPT_Date
600809 20061231  demo
       20070331  demo
       20070630  demo

Answer 2

回答by Anton Protopopov

You could subset your data with duplicated(which keeps first value by default) for index. With @Zelazny7 example:

您可以使用duplicated（默认情况下保留第一个值）为您的数据子集index。以@Zelazny7 为例：

s = pd.Series(range(10), index=[1,2,2,2,5,6,7,7,7,8])

In [130]: s[~s.index.duplicated()]
Out[130]: 
1    0
2    1
5    4
6    5
7    6
8    9
dtype: int64

Answer 3

回答by bmu

One way would be using dropand index.get_duplicates:

一种方法是使用dropand index.get_duplicates：

In [43]: df
Out[43]: 
                      String
STK_ID RPT_Date             
600809 20061231  demo_string
       20070331  demo_string
       20070630  demo_string
       20070930  demo_string
       20071231  demo_string
       20060331  demo_string
       20060630  demo_string
       20060930  demo_string
       20061231  demo_string
       20070331  demo_string
       20070630  demo_string

In [44]: df.drop(df.index.get_duplicates())
Out[44]: 
                      String
STK_ID RPT_Date             
600809 20070930  demo_string
       20071231  demo_string
       20060331  demo_string
       20060630  demo_string
       20060930  demo_string

如何删除 Pandas 系列重复索引的额外副本？

提问by bigbug

回答by Zelazny7

回答by Anton Protopopov

回答by bmu

相关推荐

最近更新

标签

如何删除 Pandas 系列重复索引的额外副本？

提问by bigbug

回答by Zelazny7

回答by Anton Protopopov

回答by bmu

相关推荐

Pandas 数据框中值的矢量化查找

pandas 按键更新pandas DataFrame

从 pandas.HDFStore 表中选择列

pandas 使用熊猫从数据框中使用两个不同的列选择行？

相关推荐

最近更新

标签