pandas 调用resample后如何用0填充（）？

Question

提问by displayname

Either I don't understand the documentationor it is outdated.

要么我不理解文档，要么它已经过时了。

If I run

如果我跑

user[["DOC_ACC_DT", "USER_SIGNON_ID"]].groupby("DOC_ACC_DT").agg(["count"]).resample("1D").fillna(value=0, method="ffill")

It get

它得到

TypeError: fillna() got an unexpected keyword argument 'value'

If I just run

如果我只是跑

.fillna(0)

I get

我得到

ValueError: Invalid fill method. Expecting pad (ffill), backfill (bfill) or nearest. Got 0

If I then set

如果我然后设置

.fillna(0, method="ffill")

I get

我得到

TypeError: fillna() got multiple values for keyword argument 'method'

so the only thing that works is

所以唯一有效的是

.fillna("ffill")

but of course that makes just a forward fill. However, I want to replace NaNwith zeros. What am I doing wrong here?

但当然这只是向前填充。但是，我想NaN用零替换。我在这里做错了什么？

Answer 1

采纳答案by displayname

Well, I don't get why the code above is not working and I'm going to wait for somebody to give a better answer than this but I just found

好吧，我不明白为什么上面的代码不起作用，我将等待有人给出比这更好的答案，但我刚刚发现

.replace(np.nan, 0)

does what I would have expected from .fillna(0).

做了我所期望的.fillna(0)。

Answer 2

回答by jezrael

I do some test and it is very interesting.

我做了一些测试，这很有趣。

Sample:

样本：

import pandas as pd
import numpy as np

np.random.seed(1)
rng = pd.date_range('1/1/2012', periods=20, freq='S')
df = pd.DataFrame({'a':['a'] * 10 + ['b'] * 10,
                   'b':np.random.randint(0, 500, len(rng))}, index=rng)
df.b.iloc[3:8] = np.nan
print (df)
                     a      b
2012-01-01 00:00:00  a   37.0
2012-01-01 00:00:01  a  235.0
2012-01-01 00:00:02  a  396.0
2012-01-01 00:00:03  a    NaN
2012-01-01 00:00:04  a    NaN
2012-01-01 00:00:05  a    NaN
2012-01-01 00:00:06  a    NaN
2012-01-01 00:00:07  a    NaN
2012-01-01 00:00:08  a  335.0
2012-01-01 00:00:09  a  448.0
2012-01-01 00:00:10  b  144.0
2012-01-01 00:00:11  b  129.0
2012-01-01 00:00:12  b  460.0
2012-01-01 00:00:13  b   71.0
2012-01-01 00:00:14  b  237.0
2012-01-01 00:00:15  b  390.0
2012-01-01 00:00:16  b  281.0
2012-01-01 00:00:17  b  178.0
2012-01-01 00:00:18  b  276.0
2012-01-01 00:00:19  b  254.0

Downsampling:

下采样：

Possible solution with Resampler.asfreq:

可能的解决方案Resampler.asfreq：

If use asfreq, behaviour is same aggregating by first:

如果使用asfreq，行为是相同的聚合first：

print (df.groupby('a').resample('2S').first())
                       a      b
a                              
a 2012-01-01 00:00:00  a   37.0
  2012-01-01 00:00:02  a  396.0
  2012-01-01 00:00:04  a    NaN
  2012-01-01 00:00:06  a    NaN
  2012-01-01 00:00:08  a  335.0
b 2012-01-01 00:00:10  b  144.0
  2012-01-01 00:00:12  b  460.0
  2012-01-01 00:00:14  b  237.0
  2012-01-01 00:00:16  b  281.0
  2012-01-01 00:00:18  b  276.0

print (df.groupby('a').resample('2S').first().fillna(0))
                       a      b
a                              
a 2012-01-01 00:00:00  a   37.0
  2012-01-01 00:00:02  a  396.0
  2012-01-01 00:00:04  a    0.0
  2012-01-01 00:00:06  a    0.0
  2012-01-01 00:00:08  a  335.0
b 2012-01-01 00:00:10  b  144.0
  2012-01-01 00:00:12  b  460.0
  2012-01-01 00:00:14  b  237.0
  2012-01-01 00:00:16  b  281.0
  2012-01-01 00:00:18  b  276.0

print (df.groupby('a').resample('2S').asfreq().fillna(0))
                       a      b
a                              
a 2012-01-01 00:00:00  a   37.0
  2012-01-01 00:00:02  a  396.0
  2012-01-01 00:00:04  a    0.0
  2012-01-01 00:00:06  a    0.0
  2012-01-01 00:00:08  a  335.0
b 2012-01-01 00:00:10  b  144.0
  2012-01-01 00:00:12  b  460.0
  2012-01-01 00:00:14  b  237.0
  2012-01-01 00:00:16  b  281.0
  2012-01-01 00:00:18  b  276.0

If use replaceanother values are aggregating as mean:

如果使用replace另一个值聚合为mean：

print (df.groupby('a').resample('2S').mean())
                           b
a                           
a 2012-01-01 00:00:00  136.0
  2012-01-01 00:00:02  396.0
  2012-01-01 00:00:04    NaN
  2012-01-01 00:00:06    NaN
  2012-01-01 00:00:08  391.5
b 2012-01-01 00:00:10  136.5
  2012-01-01 00:00:12  265.5
  2012-01-01 00:00:14  313.5
  2012-01-01 00:00:16  229.5
  2012-01-01 00:00:18  265.0

print (df.groupby('a').resample('2S').mean().fillna(0))
                           b
a                           
a 2012-01-01 00:00:00  136.0
  2012-01-01 00:00:02  396.0
  2012-01-01 00:00:04    0.0
  2012-01-01 00:00:06    0.0
  2012-01-01 00:00:08  391.5
b 2012-01-01 00:00:10  136.5
  2012-01-01 00:00:12  265.5
  2012-01-01 00:00:14  313.5
  2012-01-01 00:00:16  229.5
  2012-01-01 00:00:18  265.0

print (df.groupby('a').resample('2S').replace(np.nan,0))
                           b
a                           
a 2012-01-01 00:00:00  136.0
  2012-01-01 00:00:02  396.0
  2012-01-01 00:00:04    0.0
  2012-01-01 00:00:06    0.0
  2012-01-01 00:00:08  391.5
b 2012-01-01 00:00:10  136.5
  2012-01-01 00:00:12  265.5
  2012-01-01 00:00:14  313.5
  2012-01-01 00:00:16  229.5
  2012-01-01 00:00:18  265.0

Upsampling:

上采样：

Use asfreq, it is same as replace:

使用asfreq, 等同于replace：

print (df.groupby('a').resample('200L').asfreq().fillna(0))
                           a      b
a                                  
a 2012-01-01 00:00:00.000  a   37.0
  2012-01-01 00:00:00.200  0    0.0
  2012-01-01 00:00:00.400  0    0.0
  2012-01-01 00:00:00.600  0    0.0
  2012-01-01 00:00:00.800  0    0.0
  2012-01-01 00:00:01.000  a  235.0
  2012-01-01 00:00:01.200  0    0.0
  2012-01-01 00:00:01.400  0    0.0
  2012-01-01 00:00:01.600  0    0.0
  2012-01-01 00:00:01.800  0    0.0
  2012-01-01 00:00:02.000  a  396.0
  2012-01-01 00:00:02.200  0    0.0
  2012-01-01 00:00:02.400  0    0.0
  ...

print (df.groupby('a').resample('200L').replace(np.nan,0))
                               b
a                               
a 2012-01-01 00:00:00.000   37.0
  2012-01-01 00:00:00.200    0.0
  2012-01-01 00:00:00.400    0.0
  2012-01-01 00:00:00.600    0.0
  2012-01-01 00:00:00.800    0.0
  2012-01-01 00:00:01.000  235.0
  2012-01-01 00:00:01.200    0.0
  2012-01-01 00:00:01.400    0.0
  2012-01-01 00:00:01.600    0.0
  2012-01-01 00:00:01.800    0.0
  2012-01-01 00:00:02.000  396.0
  2012-01-01 00:00:02.200    0.0
  2012-01-01 00:00:02.400    0.0
  ...

print ((df.groupby('a').resample('200L').replace(np.nan,0).b == 
       df.groupby('a').resample('200L').asfreq().fillna(0).b).all())
True

Conclusion:

结论：

For downsampling use same aggregating function like sum, firstor meanand for upsampling asfreq.

对于下采样，使用相同的聚合函数，如sum,first或mean和用于上采样asfreq。

Answer 3

回答by Ryszard Cetnarski

The issue here is that you try to call the fillnamethod from DatetimeIndexResamplerobject, which is returned by the resamplemethod. If you call an aggregation function before fillna it will work, for example: df.resample('1H').sum().fillna(0)

这里的问题是您尝试fillna从方法DatetimeIndexResampler返回的对象调用resample方法。如果您在 fillna 之前调用聚合函数，它将起作用，例如：df.resample('1H').sum().fillna(0)

Answer 4

回答by Nickil Maveli

The only workaround close to using fillnadirectly would be to call it after performing .head(len(df.index)).

接近fillna直接使用的唯一解决方法是在执行后调用它.head(len(df.index))。

I'm presuming DF.headto be useful in this case mainly because when resample function is applied to a groupby object, it will act as a filter on the input, returning a reduced shape of the original due to elimination of groups.

我认为DF.head在这种情况下很有用，主要是因为当 resample 函数应用于 groupby 对象时，它将充当输入的过滤器，由于消除组而返回原始形状的缩小形状。

Calling DF.head()does not get affected by this transformation and returns the entire DF.

调用DF.head()不受此转换的影响并返回整个DF.

Demo:

演示：

np.random.seed(42)

df = pd.DataFrame(np.random.randn(10, 2),
              index=pd.date_range('1/1/2016', freq='10D', periods=10),
              columns=['A', 'B']).reset_index()

df
       index         A         B
0 2016-01-01  0.496714 -0.138264
1 2016-01-11  0.647689  1.523030
2 2016-01-21 -0.234153 -0.234137
3 2016-01-31  1.579213  0.767435
4 2016-02-10 -0.469474  0.542560
5 2016-02-20 -0.463418 -0.465730
6 2016-03-01  0.241962 -1.913280
7 2016-03-11 -1.724918 -0.562288
8 2016-03-21 -1.012831  0.314247
9 2016-03-31 -0.908024 -1.412304

Operations:

操作：

resampled_group = df[['index', 'A']].groupby(['index'])['A'].agg('count').resample('2D')
resampled_group.head(len(resampled_group.index)).fillna(0).head(20)

index
2016-01-01    1.0
2016-01-03    0.0
2016-01-05    0.0
2016-01-07    0.0
2016-01-09    0.0
2016-01-11    1.0
2016-01-13    0.0
2016-01-15    0.0
2016-01-17    0.0
2016-01-19    0.0
2016-01-21    1.0
2016-01-23    0.0
2016-01-25    0.0
2016-01-27    0.0
2016-01-29    0.0
2016-01-31    1.0
2016-02-02    0.0
2016-02-04    0.0
2016-02-06    0.0
2016-02-08    0.0
Freq: 2D, Name: A, dtype: float64

pandas 调用resample后如何用0填充（）？

提问by displayname

采纳答案by displayname

回答by jezrael

回答by Ryszard Cetnarski

回答by Nickil Maveli

相关推荐

最近更新

标签

pandas 调用resample后如何用0填充（）？

提问by displayname

采纳答案by displayname

回答by jezrael

回答by Ryszard Cetnarski

回答by Nickil Maveli

相关推荐

Python：使用给定的列为带有 x 轴的 Pandas 数据框绘制条形图

在 Pandas DataFrame 中对空值使用 None 而不是 np.nan

pandas groupby、sum 和 count 到一张表

pandas 如何在pandas.read_csv的标题之前跳过未知数量的空行？

相关推荐

最近更新

标签