Pandas - 重采样和标准差

Question

提问by grasshopper

I have this dataframe:

我有这个数据框：

startTime     endTime  emails_received
index                                             
2014-01-24 14:00:00  1390568400  1390569600    684
2014-01-24 14:00:00  1390568400  1390569300    700
2014-01-24 14:05:00  1390568700  1390569300    438
2014-01-24 14:05:00  1390568700  1390569900    586
2014-01-24 16:00:00  1390575600  1390576500    752
2014-01-24 16:00:00  1390575600  1390576500    743
2014-01-24 16:00:00  1390575600  1390576500    672
2014-01-24 16:00:00  1390575600  1390576200    712
2014-01-24 16:00:00  1390575600  1390576800    708

I run resample("10min",how="median").dropna() and I get:

我运行 resample("10min",how="median").dropna() 我得到：

                  startTime     endTime  emails_received
start                                             
2014-01-24 14:00:00  1390568550  1390569450    635
2014-01-24 16:00:00  1390575600  1390576500    712

which is correct. Is there any way I can also get the standard deviation from the mean easily via pandas?

哪个是正确的。有什么方法可以通过Pandas轻松地获得均值的标准偏差？

Answer 1

回答by Nipun Batra

You just need to call .std()on your DataFrame. Here is an illustrative example.

您只需要调用.std()您的 DataFrame。这是一个说明性示例。

Creating a DatetimeIndex

创建一个 DatetimeIndex

In [38]: index = pd.DatetimeIndex(start='2000-1-1',freq='1T', periods=1000)

Creating a DataFrame with 2 columns

创建具有 2 列的 DataFrame

In [45]: df = pd.DataFrame({'a':range(1000), 'b':range(1000,3000,2)}, index=index)

Head, Std and Mean of the DataFrame

DataFrame 的头部、标准和均值

In [47]: df.head()
Out[47]: 
                     a     b
2000-01-01 00:00:00  0  1000
2000-01-01 00:01:00  1  1002
2000-01-01 00:02:00  2  1004
2000-01-01 00:03:00  3  1006
2000-01-01 00:04:00  4  1008

In [48]: df.std()
Out[48]: 
a    288.819436
b    577.638872
dtype: float64

In [49]: df.mean()
Out[49]: 
a     499.5
b    1999.0
dtype: float64

Downsample and perform the calculate the same statistical scores

下采样并执行计算相同的统计分数

In [54]: df = df.resample(rule="10T",how="median")

In [55]: df
Out[55]: 

DatetimeIndex: 100 entries, 2000-01-01 00:00:00 to 2000-01-01 16:30:00
Freq: 10T
Data columns (total 2 columns):
a    100  non-null values
b    100  non-null values
dtypes: float64(1), int64(1)

In [56]: df.head()
Out[56]: 
                        a     b
2000-01-01 00:00:00   4.5  1009
2000-01-01 00:10:00  14.5  1029
2000-01-01 00:20:00  24.5  1049
2000-01-01 00:30:00  34.5  1069
2000-01-01 00:40:00  44.5  1089

In [57]: df.std()
Out[57]: 
a    290.11492
b    580.22984
dtype: float64

In [58]: df.mean()
Out[58]: 
a     499.5
b    1999.0
dtype: float64

Downsampling by `std()`

下采样 `std()`

In [62]: df2 = df.resample(rule="10T", how=np.std)

In [63]: df2
Out[63]: 

DatetimeIndex: 100 entries, 2000-01-01 00:00:00 to 2000-01-01 16:30:00
Freq: 10T
Data columns (total 2 columns):
a    100  non-null values
b    100  non-null values
dtypes: float64(2)

In [64]: df2.head()
Out[64]: 
                           a         b
2000-01-01 00:00:00  3.02765  6.055301
2000-01-01 00:10:00  3.02765  6.055301
2000-01-01 00:20:00  3.02765  6.055301
2000-01-01 00:30:00  3.02765  6.055301
2000-01-01 00:40:00  3.02765  6.055301

Following is the information from the docstring for the .std()method.

以下是该.std()方法的文档字符串中的信息。

Return standard deviation over requested axis.
NA/null values are excluded

Parameters
----------
axis : {0, 1}
    0 for row-wise, 1 for column-wise
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA
level : int, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a DataFrame

Returns
-------
std : Series (or DataFrame if level specified)

        Normalized by N-1 (unbiased estimator).

Pandas - 重采样和标准差

提问by grasshopper

回答by Nipun Batra

Downsampling by `std()`

下采样 `std()`

相关推荐

最近更新

标签

Pandas - 重采样和标准差

提问by grasshopper

回答by Nipun Batra

Downsampling by std()

下采样 std()

相关推荐

从 Pandas DataFrame 构建 NetworkX 图

pandas 获取在熊猫的列中具有相同值的行

pandas 按两列（或更多）对熊猫数据框进行分组？

pandas 在 Mac OS 10.8 上的 Anaconda 中更改默认 Python 环境（从 2.7 到 3.3）

相关推荐

最近更新

标签

Downsampling by `std()`

下采样 `std()`