pandas Python熊猫唯一值忽略NaN

Question

提问by ragesz

I want to use uniquein groupbyaggregation, but I don't want nanin the uniqueresult.

我想unique在groupby聚合中使用，但我不想nan在unique结果中使用。

An example dataframe:

一个示例数据框：

df = pd.DataFrame({'a': [1, 2, 1, 1, pd.np.nan, 3, 3], 'b': [0,0,1,1,1,1,1],
    'c': ['foo', pd.np.nan, 'bar', 'foo', 'baz', 'foo', 'bar']})

       a  b    c
0 1.0000  0  foo
1 2.0000  0  NaN
2 1.0000  1  bar
3 1.0000  1  foo
4    nan  1  baz
5 3.0000  1  foo
6 3.0000  1  bar

And the groupby:

和groupby：

df.groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

It's result is:

它的结果是：

       a                             c                      
     min    max           unique first last           unique
b                                                           
0 1.0000 2.0000       [1.0, 2.0]   foo  foo       [foo, nan]
1 1.0000 3.0000  [1.0, nan, 3.0]   bar  bar  [bar, foo, baz]

But I want it without nan:

但我想要它没有nan：

       a                        c                      
     min    max      unique first last           unique
b                                                           
0 1.0000 2.0000  [1.0, 2.0]   foo  foo            [foo]
1 1.0000 3.0000  [1.0, 3.0]   bar  bar  [bar, foo, baz]

How can I do that? Of course I have several columns to aggregate and every column needs different aggregation functions, so I don't want to do the uniqueaggregations one-by-one and separately from other aggregations.

我怎样才能做到这一点？当然，我有几列要聚合，每一列都需要不同的聚合函数，所以我不想unique一一进行聚合，也不想与其他聚合分开进行。

Thank you!

谢谢！

Answer 1

采纳答案by Bharath

Try ffill

尝试 ffill

df.ffill().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

      c                          a                 
  first last           unique  min  max      unique
b                                                  
0   foo  foo            [foo]  1.0  2.0  [1.0, 2.0]
1   bar  bar  [bar, foo, baz]  1.0  3.0  [1.0, 3.0]

If Nan is the first element of the group then the above solution breaks. @IanS's solution is better in the long run.

如果 Nan 是该组的第一个元素，则上述解决方案中断。@IanS从长远来看，解决方案更好。

Answer 2

回答by IanS

Define a function:

定义一个函数：

def unique_non_null(s):
    return s.dropna().unique()

Then use it in the aggregation:

然后在聚合中使用它：

df.groupby('b').agg({
    'a': ['min', 'max', unique_non_null], 
    'c': ['first', 'last', unique_non_null]
})

Or :

或者：

df.dropna().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

Answer 3

回答by zipa

This will work for what you need:

这将适用于您的需要：

df.fillna(method='ffill').groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

Because you use min, maxand uniquerepeated values do not concern you.

因为您使用min,max并且unique重复值与您无关。

pandas Python熊猫唯一值忽略NaN

提问by ragesz

采纳答案by Bharath

回答by IanS

回答by zipa

相关推荐

最近更新

标签

pandas Python熊猫唯一值忽略NaN

提问by ragesz

采纳答案by Bharath

回答by IanS

回答by zipa

相关推荐

pandas 将数据标签添加到折线图

pandas 熊猫辅助轴

pandas 重命名 csv 文件中的列

Python-Pandas-Dataframe-datetime 转换不包括空值单元格

相关推荐

最近更新

标签