pandas 调用resample后如何用0填充()?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39452095/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to fillna() with value 0 after calling resample?
提问by displayname
Either I don't understand the documentationor it is outdated.
要么我不理解文档,要么它已经过时了。
If I run
如果我跑
user[["DOC_ACC_DT", "USER_SIGNON_ID"]].groupby("DOC_ACC_DT").agg(["count"]).resample("1D").fillna(value=0, method="ffill")
It get
它得到
TypeError: fillna() got an unexpected keyword argument 'value'
If I just run
如果我只是跑
.fillna(0)
I get
我得到
ValueError: Invalid fill method. Expecting pad (ffill), backfill (bfill) or nearest. Got 0
If I then set
如果我然后设置
.fillna(0, method="ffill")
I get
我得到
TypeError: fillna() got multiple values for keyword argument 'method'
so the only thing that works is
所以唯一有效的是
.fillna("ffill")
but of course that makes just a forward fill. However, I want to replace NaN
with zeros. What am I doing wrong here?
但当然这只是向前填充。但是,我想NaN
用零替换。我在这里做错了什么?
采纳答案by displayname
Well, I don't get why the code above is not working and I'm going to wait for somebody to give a better answer than this but I just found
好吧,我不明白为什么上面的代码不起作用,我将等待有人给出比这更好的答案,但我刚刚发现
.replace(np.nan, 0)
does what I would have expected from .fillna(0)
.
做了我所期望的.fillna(0)
。
回答by jezrael
I do some test and it is very interesting.
我做了一些测试,这很有趣。
Sample:
样本:
import pandas as pd
import numpy as np
np.random.seed(1)
rng = pd.date_range('1/1/2012', periods=20, freq='S')
df = pd.DataFrame({'a':['a'] * 10 + ['b'] * 10,
'b':np.random.randint(0, 500, len(rng))}, index=rng)
df.b.iloc[3:8] = np.nan
print (df)
a b
2012-01-01 00:00:00 a 37.0
2012-01-01 00:00:01 a 235.0
2012-01-01 00:00:02 a 396.0
2012-01-01 00:00:03 a NaN
2012-01-01 00:00:04 a NaN
2012-01-01 00:00:05 a NaN
2012-01-01 00:00:06 a NaN
2012-01-01 00:00:07 a NaN
2012-01-01 00:00:08 a 335.0
2012-01-01 00:00:09 a 448.0
2012-01-01 00:00:10 b 144.0
2012-01-01 00:00:11 b 129.0
2012-01-01 00:00:12 b 460.0
2012-01-01 00:00:13 b 71.0
2012-01-01 00:00:14 b 237.0
2012-01-01 00:00:15 b 390.0
2012-01-01 00:00:16 b 281.0
2012-01-01 00:00:17 b 178.0
2012-01-01 00:00:18 b 276.0
2012-01-01 00:00:19 b 254.0
Downsampling:
下采样:
Possible solution with Resampler.asfreq
:
可能的解决方案Resampler.asfreq
:
If use asfreq
, behaviour is same aggregating by first
:
如果使用asfreq
,行为是相同的聚合first
:
print (df.groupby('a').resample('2S').first())
a b
a
a 2012-01-01 00:00:00 a 37.0
2012-01-01 00:00:02 a 396.0
2012-01-01 00:00:04 a NaN
2012-01-01 00:00:06 a NaN
2012-01-01 00:00:08 a 335.0
b 2012-01-01 00:00:10 b 144.0
2012-01-01 00:00:12 b 460.0
2012-01-01 00:00:14 b 237.0
2012-01-01 00:00:16 b 281.0
2012-01-01 00:00:18 b 276.0
print (df.groupby('a').resample('2S').first().fillna(0))
a b
a
a 2012-01-01 00:00:00 a 37.0
2012-01-01 00:00:02 a 396.0
2012-01-01 00:00:04 a 0.0
2012-01-01 00:00:06 a 0.0
2012-01-01 00:00:08 a 335.0
b 2012-01-01 00:00:10 b 144.0
2012-01-01 00:00:12 b 460.0
2012-01-01 00:00:14 b 237.0
2012-01-01 00:00:16 b 281.0
2012-01-01 00:00:18 b 276.0
print (df.groupby('a').resample('2S').asfreq().fillna(0))
a b
a
a 2012-01-01 00:00:00 a 37.0
2012-01-01 00:00:02 a 396.0
2012-01-01 00:00:04 a 0.0
2012-01-01 00:00:06 a 0.0
2012-01-01 00:00:08 a 335.0
b 2012-01-01 00:00:10 b 144.0
2012-01-01 00:00:12 b 460.0
2012-01-01 00:00:14 b 237.0
2012-01-01 00:00:16 b 281.0
2012-01-01 00:00:18 b 276.0
If use replace
another values are aggregating as mean
:
如果使用replace
另一个值聚合为mean
:
print (df.groupby('a').resample('2S').mean())
b
a
a 2012-01-01 00:00:00 136.0
2012-01-01 00:00:02 396.0
2012-01-01 00:00:04 NaN
2012-01-01 00:00:06 NaN
2012-01-01 00:00:08 391.5
b 2012-01-01 00:00:10 136.5
2012-01-01 00:00:12 265.5
2012-01-01 00:00:14 313.5
2012-01-01 00:00:16 229.5
2012-01-01 00:00:18 265.0
print (df.groupby('a').resample('2S').mean().fillna(0))
b
a
a 2012-01-01 00:00:00 136.0
2012-01-01 00:00:02 396.0
2012-01-01 00:00:04 0.0
2012-01-01 00:00:06 0.0
2012-01-01 00:00:08 391.5
b 2012-01-01 00:00:10 136.5
2012-01-01 00:00:12 265.5
2012-01-01 00:00:14 313.5
2012-01-01 00:00:16 229.5
2012-01-01 00:00:18 265.0
print (df.groupby('a').resample('2S').replace(np.nan,0))
b
a
a 2012-01-01 00:00:00 136.0
2012-01-01 00:00:02 396.0
2012-01-01 00:00:04 0.0
2012-01-01 00:00:06 0.0
2012-01-01 00:00:08 391.5
b 2012-01-01 00:00:10 136.5
2012-01-01 00:00:12 265.5
2012-01-01 00:00:14 313.5
2012-01-01 00:00:16 229.5
2012-01-01 00:00:18 265.0
Upsampling:
上采样:
Use asfreq
, it is same as replace
:
使用asfreq
, 等同于replace
:
print (df.groupby('a').resample('200L').asfreq().fillna(0))
a b
a
a 2012-01-01 00:00:00.000 a 37.0
2012-01-01 00:00:00.200 0 0.0
2012-01-01 00:00:00.400 0 0.0
2012-01-01 00:00:00.600 0 0.0
2012-01-01 00:00:00.800 0 0.0
2012-01-01 00:00:01.000 a 235.0
2012-01-01 00:00:01.200 0 0.0
2012-01-01 00:00:01.400 0 0.0
2012-01-01 00:00:01.600 0 0.0
2012-01-01 00:00:01.800 0 0.0
2012-01-01 00:00:02.000 a 396.0
2012-01-01 00:00:02.200 0 0.0
2012-01-01 00:00:02.400 0 0.0
...
print (df.groupby('a').resample('200L').replace(np.nan,0))
b
a
a 2012-01-01 00:00:00.000 37.0
2012-01-01 00:00:00.200 0.0
2012-01-01 00:00:00.400 0.0
2012-01-01 00:00:00.600 0.0
2012-01-01 00:00:00.800 0.0
2012-01-01 00:00:01.000 235.0
2012-01-01 00:00:01.200 0.0
2012-01-01 00:00:01.400 0.0
2012-01-01 00:00:01.600 0.0
2012-01-01 00:00:01.800 0.0
2012-01-01 00:00:02.000 396.0
2012-01-01 00:00:02.200 0.0
2012-01-01 00:00:02.400 0.0
...
print ((df.groupby('a').resample('200L').replace(np.nan,0).b ==
df.groupby('a').resample('200L').asfreq().fillna(0).b).all())
True
Conclusion:
结论:
For downsampling use same aggregating function like sum
, first
or mean
and for upsampling asfreq
.
对于下采样,使用相同的聚合函数,如sum
,first
或mean
和 用于上采样asfreq
。
回答by Ryszard Cetnarski
The issue here is that you try to call the fillna
method from DatetimeIndexResampler
object, which is returned by the resample
method. If you call an aggregation function before fillna it will work, for example: df.resample('1H').sum().fillna(0)
这里的问题是您尝试fillna
从方法DatetimeIndexResampler
返回的对象调用resample
方法。如果您在 fillna 之前调用聚合函数,它将起作用,例如:df.resample('1H').sum().fillna(0)
回答by Nickil Maveli
The only workaround close to using fillna
directly would be to call it after performing .head(len(df.index))
.
接近fillna
直接使用的唯一解决方法是在执行后调用它.head(len(df.index))
。
I'm presuming DF.head
to be useful in this case mainly because when resample function is applied to a groupby object, it will act as a filter on the input, returning a reduced shape of the original due to elimination of groups.
我认为DF.head
在这种情况下很有用,主要是因为当 resample 函数应用于 groupby 对象时,它将充当输入的过滤器,由于消除组而返回原始形状的缩小形状。
Calling DF.head()
does not get affected by this transformation and returns the entire DF
.
调用DF.head()
不受此转换的影响并返回整个DF
.
Demo:
演示:
np.random.seed(42)
df = pd.DataFrame(np.random.randn(10, 2),
index=pd.date_range('1/1/2016', freq='10D', periods=10),
columns=['A', 'B']).reset_index()
df
index A B
0 2016-01-01 0.496714 -0.138264
1 2016-01-11 0.647689 1.523030
2 2016-01-21 -0.234153 -0.234137
3 2016-01-31 1.579213 0.767435
4 2016-02-10 -0.469474 0.542560
5 2016-02-20 -0.463418 -0.465730
6 2016-03-01 0.241962 -1.913280
7 2016-03-11 -1.724918 -0.562288
8 2016-03-21 -1.012831 0.314247
9 2016-03-31 -0.908024 -1.412304
Operations:
操作:
resampled_group = df[['index', 'A']].groupby(['index'])['A'].agg('count').resample('2D')
resampled_group.head(len(resampled_group.index)).fillna(0).head(20)
index
2016-01-01 1.0
2016-01-03 0.0
2016-01-05 0.0
2016-01-07 0.0
2016-01-09 0.0
2016-01-11 1.0
2016-01-13 0.0
2016-01-15 0.0
2016-01-17 0.0
2016-01-19 0.0
2016-01-21 1.0
2016-01-23 0.0
2016-01-25 0.0
2016-01-27 0.0
2016-01-29 0.0
2016-01-31 1.0
2016-02-02 0.0
2016-02-04 0.0
2016-02-06 0.0
2016-02-08 0.0
Freq: 2D, Name: A, dtype: float64