Pandas - 重采样和标准差
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/21480041/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - resample and standard deviation
提问by grasshopper
I have this dataframe:
我有这个数据框:
startTime     endTime  emails_received
index                                             
2014-01-24 14:00:00  1390568400  1390569600    684
2014-01-24 14:00:00  1390568400  1390569300    700
2014-01-24 14:05:00  1390568700  1390569300    438
2014-01-24 14:05:00  1390568700  1390569900    586
2014-01-24 16:00:00  1390575600  1390576500    752
2014-01-24 16:00:00  1390575600  1390576500    743
2014-01-24 16:00:00  1390575600  1390576500    672
2014-01-24 16:00:00  1390575600  1390576200    712
2014-01-24 16:00:00  1390575600  1390576800    708
I run resample("10min",how="median").dropna() and I get:
我运行 resample("10min",how="median").dropna() 我得到:
                  startTime     endTime  emails_received
start                                             
2014-01-24 14:00:00  1390568550  1390569450    635
2014-01-24 16:00:00  1390575600  1390576500    712
which is correct. Is there any way I can also get the standard deviation from the mean easily via pandas?
哪个是正确的。有什么方法可以通过Pandas轻松地获得均值的标准偏差?
回答by Nipun Batra
You just need to call .std()on your DataFrame. Here is an illustrative example.
您只需要调用.std()您的 DataFrame。这是一个说明性示例。
Creating a DatetimeIndex
创建一个 DatetimeIndex
In [38]: index = pd.DatetimeIndex(start='2000-1-1',freq='1T', periods=1000)
Creating a DataFrame with 2 columns
创建具有 2 列的 DataFrame
In [45]: df = pd.DataFrame({'a':range(1000), 'b':range(1000,3000,2)}, index=index)
Head, Std and Mean of the DataFrame
DataFrame 的头部、标准和均值
In [47]: df.head()
Out[47]: 
                     a     b
2000-01-01 00:00:00  0  1000
2000-01-01 00:01:00  1  1002
2000-01-01 00:02:00  2  1004
2000-01-01 00:03:00  3  1006
2000-01-01 00:04:00  4  1008
In [48]: df.std()
Out[48]: 
a    288.819436
b    577.638872
dtype: float64
In [49]: df.mean()
Out[49]: 
a     499.5
b    1999.0
dtype: float64
Downsample and perform the calculate the same statistical scores
下采样并执行计算相同的统计分数
In [54]: df = df.resample(rule="10T",how="median")
In [55]: df
Out[55]: 
DatetimeIndex: 100 entries, 2000-01-01 00:00:00 to 2000-01-01 16:30:00
Freq: 10T
Data columns (total 2 columns):
a    100  non-null values
b    100  non-null values
dtypes: float64(1), int64(1)
In [56]: df.head()
Out[56]: 
                        a     b
2000-01-01 00:00:00   4.5  1009
2000-01-01 00:10:00  14.5  1029
2000-01-01 00:20:00  24.5  1049
2000-01-01 00:30:00  34.5  1069
2000-01-01 00:40:00  44.5  1089
In [57]: df.std()
Out[57]: 
a    290.11492
b    580.22984
dtype: float64
In [58]: df.mean()
Out[58]: 
a     499.5
b    1999.0
dtype: float64
Downsampling by std()
下采样 std()
In [62]: df2 = df.resample(rule="10T", how=np.std)
In [63]: df2
Out[63]: 
DatetimeIndex: 100 entries, 2000-01-01 00:00:00 to 2000-01-01 16:30:00
Freq: 10T
Data columns (total 2 columns):
a    100  non-null values
b    100  non-null values
dtypes: float64(2)
In [64]: df2.head()
Out[64]: 
                           a         b
2000-01-01 00:00:00  3.02765  6.055301
2000-01-01 00:10:00  3.02765  6.055301
2000-01-01 00:20:00  3.02765  6.055301
2000-01-01 00:30:00  3.02765  6.055301
2000-01-01 00:40:00  3.02765  6.055301
Following is the information from the docstring for the .std()method.
以下是该.std()方法的文档字符串中的信息。
Return standard deviation over requested axis.
NA/null values are excluded
Parameters
----------
axis : {0, 1}
    0 for row-wise, 1 for column-wise
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA
level : int, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a DataFrame
Returns
-------
std : Series (or DataFrame if level specified)
        Normalized by N-1 (unbiased estimator).

