pandas Python熊猫唯一值忽略NaN

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46218652/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:27:07  来源:igfitidea点击:

Python pandas unique value ignoring NaN

pythonpandasgroup-bynullunique

提问by ragesz

I want to use uniquein groupbyaggregation, but I don't want nanin the uniqueresult.

我想uniquegroupby聚合中使用,但我不想nanunique结果中使用。

An example dataframe:

一个示例数据框:

df = pd.DataFrame({'a': [1, 2, 1, 1, pd.np.nan, 3, 3], 'b': [0,0,1,1,1,1,1],
    'c': ['foo', pd.np.nan, 'bar', 'foo', 'baz', 'foo', 'bar']})

       a  b    c
0 1.0000  0  foo
1 2.0000  0  NaN
2 1.0000  1  bar
3 1.0000  1  foo
4    nan  1  baz
5 3.0000  1  foo
6 3.0000  1  bar

And the groupby:

groupby

df.groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

It's result is:

它的结果是:

       a                             c                      
     min    max           unique first last           unique
b                                                           
0 1.0000 2.0000       [1.0, 2.0]   foo  foo       [foo, nan]
1 1.0000 3.0000  [1.0, nan, 3.0]   bar  bar  [bar, foo, baz]

But I want it without nan:

但我想要它没有nan

       a                        c                      
     min    max      unique first last           unique
b                                                           
0 1.0000 2.0000  [1.0, 2.0]   foo  foo            [foo]
1 1.0000 3.0000  [1.0, 3.0]   bar  bar  [bar, foo, baz]

How can I do that? Of course I have several columns to aggregate and every column needs different aggregation functions, so I don't want to do the uniqueaggregations one-by-one and separately from other aggregations.

我怎样才能做到这一点?当然,我有几列要聚合,每一列都需要不同的聚合函数,所以我不想unique一一进行聚合,也不想与其他聚合分开进行。

Thank you!

谢谢!

采纳答案by Bharath

Try ffill

尝试 ffill

df.ffill().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})
      c                          a                 
  first last           unique  min  max      unique
b                                                  
0   foo  foo            [foo]  1.0  2.0  [1.0, 2.0]
1   bar  bar  [bar, foo, baz]  1.0  3.0  [1.0, 3.0]

If Nan is the first element of the group then the above solution breaks. @IanS's solution is better in the long run.

如果 Nan 是该组的第一个元素,则上述解决方案中断。@IanS从长远来看,解决方案更好。

回答by IanS

Define a function:

定义一个函数:

def unique_non_null(s):
    return s.dropna().unique()

Then use it in the aggregation:

然后在聚合中使用它:

df.groupby('b').agg({
    'a': ['min', 'max', unique_non_null], 
    'c': ['first', 'last', unique_non_null]
})

Or :

或者 :

df.dropna().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

回答by zipa

This will work for what you need:

这将适用于您的需要:

df.fillna(method='ffill').groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

Because you use min, maxand uniquerepeated values do not concern you.

因为您使用min,max并且unique重复值与您无关。