pandas Python熊猫唯一值忽略NaN
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46218652/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas unique value ignoring NaN
提问by ragesz
I want to use unique
in groupby
aggregation, but I don't want nan
in the unique
result.
我想unique
在groupby
聚合中使用,但我不想nan
在unique
结果中使用。
An example dataframe:
一个示例数据框:
df = pd.DataFrame({'a': [1, 2, 1, 1, pd.np.nan, 3, 3], 'b': [0,0,1,1,1,1,1],
'c': ['foo', pd.np.nan, 'bar', 'foo', 'baz', 'foo', 'bar']})
a b c
0 1.0000 0 foo
1 2.0000 0 NaN
2 1.0000 1 bar
3 1.0000 1 foo
4 nan 1 baz
5 3.0000 1 foo
6 3.0000 1 bar
And the groupby
:
和groupby
:
df.groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})
It's result is:
它的结果是:
a c
min max unique first last unique
b
0 1.0000 2.0000 [1.0, 2.0] foo foo [foo, nan]
1 1.0000 3.0000 [1.0, nan, 3.0] bar bar [bar, foo, baz]
But I want it without nan
:
但我想要它没有nan
:
a c
min max unique first last unique
b
0 1.0000 2.0000 [1.0, 2.0] foo foo [foo]
1 1.0000 3.0000 [1.0, 3.0] bar bar [bar, foo, baz]
How can I do that? Of course I have several columns to aggregate and every column needs different aggregation functions, so I don't want to do the unique
aggregations one-by-one and separately from other aggregations.
我怎样才能做到这一点?当然,我有几列要聚合,每一列都需要不同的聚合函数,所以我不想unique
一一进行聚合,也不想与其他聚合分开进行。
Thank you!
谢谢!
采纳答案by Bharath
Try ffill
尝试 ffill
df.ffill().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})
c a first last unique min max unique b 0 foo foo [foo] 1.0 2.0 [1.0, 2.0] 1 bar bar [bar, foo, baz] 1.0 3.0 [1.0, 3.0]
If Nan is the first element of the group then the above solution breaks. @IanS
's solution is better in the long run.
如果 Nan 是该组的第一个元素,则上述解决方案中断。@IanS
从长远来看,解决方案更好。
回答by IanS
Define a function:
定义一个函数:
def unique_non_null(s):
return s.dropna().unique()
Then use it in the aggregation:
然后在聚合中使用它:
df.groupby('b').agg({
'a': ['min', 'max', unique_non_null],
'c': ['first', 'last', unique_non_null]
})
Or :
或者 :
df.dropna().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})
回答by zipa
This will work for what you need:
这将适用于您的需要:
df.fillna(method='ffill').groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})
Because you use min
, max
and unique
repeated values do not concern you.
因为您使用min
,max
并且unique
重复值与您无关。