pandas 熊猫重新采样选项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36488071/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:00:37  来源:igfitidea点击:

Pandas resample options

pythonpandas

提问by Krishh

I looked up the documentation for the pandas resample module. While it does describe the parameters available in the function, It doesn't tell me the possible options of these parameters. For ex. the howparameter takes a value of 'sum'(as shown in example), but what other values are possible, and what do these do? Similarly for fillparameter. Can anybody tell me/provide a link to available values of these parameters??

我查找了pandas resample 模块的文档。虽然它确实描述了函数中可用的参数,但它没有告诉我这些参数的可能选项。例如。的how参数采用的值'sum'(如所示实施例中),但这些做什么其它值是可能的,什么呢?fill参数类似。谁能告诉我/提供这些参数的可用值的链接?

回答by ptrj

A good place to start is probably pandas tutorialon times series functionality. It doesn't, however. cover the topic thoroughly.

一个很好的起点可能是关于时间序列功能的Pandas教程。然而,事实并非如此。彻底覆盖主题。

You may also look at the Cookbookthere - just to find out the most of the links point to... Stack Overflow.

你也可以看看那里的食谱- 只是为了找出大部分链接指向......堆栈溢出。

I found a table of the method arguments in Python for Data Analysis.

我在Python for Data Analysis 中找到了一个方法参数表。



As for the two particular parameters you ask:

至于你问的两个特定参数:

  1. how- can be a string denoting a common function (as 'sum', 'mean', etc.), a custom function taking arrays, and - what is probably not mentioned there - a dict of functions for specific columns in DataFrame (eg. how = {col1: fun1, col2: fun2})

  2. fill_method- can be ffill(aka pad) or bfill(aka backfill) - fills values forward or backward.

  1. how- 可以是表示通用函数的字符串(如'sum''mean'等)、采用数组的自定义函数,以及 - 那里可能没有提到的 - DataFrame 中特定列的函数字典(例如how = {col1: fun1, col2: fun2}

  2. fill_method- 可以是ffill(aka pad) 或bfill(aka backfill) - 向前或向后填充值。

回答by MaxU

I think the best yet documented part about new resample function might be found in the what's new part for pandas 0.18.0:

我认为关于新重采样功能的最好的文档部分可能会在pandas 0.18.0 的新内容中找到:

New API:

新 API:

Now, you can write .resample(..)as a 2-stage operation like .groupby(...), which yields a Resampler.

现在,你可以写.resample(..)为2阶段的操作一样.groupby(...),这将产生一个重采样。

In [82]: r = df.resample('2s')

In [83]: r
Out[83]: DatetimeIndexResampler [freq=<2 * Seconds>, axis=0, closed=left, label=left, convention=start, base=0]

Downsampling

下采样

You can then use this object to perform operations. These are downsampling operations (going from a higher frequency to a lower one).

然后您可以使用此对象来执行操作。这些是下采样操作(从较高频率到较低频率)。

In [84]: r.mean()
Out[84]: 
                            A         B         C         D
2010-01-01 09:00:00  0.485748  0.447351  0.357096  0.793615
2010-01-01 09:00:02  0.820801  0.794317  0.364034  0.531096
2010-01-01 09:00:04  0.433985  0.314582  0.424104  0.625733
2010-01-01 09:00:06  0.624988  0.609738  0.633165  0.612452
2010-01-01 09:00:08  0.510470  0.534317  0.573201  0.806949
In [85]: r.sum()
Out[85]: 
                            A         B         C         D
2010-01-01 09:00:00  0.971495  0.894701  0.714192  1.587231
2010-01-01 09:00:02  1.641602  1.588635  0.728068  1.062191
2010-01-01 09:00:04  0.867969  0.629165  0.848208  1.251465
2010-01-01 09:00:06  1.249976  1.219477  1.266330  1.224904
2010-01-01 09:00:08  1.020940  1.068634  1.146402  1.613897

Furthermore, resample now supports getitem operations to perform the resample on specific columns.

此外,resample 现在支持 getitem 操作以对特定列执行重采样。

In [86]: r[['A','C']].mean()
Out[86]: 
                            A         C
2010-01-01 09:00:00  0.485748  0.357096
2010-01-01 09:00:02  0.820801  0.364034
2010-01-01 09:00:04  0.433985  0.424104
2010-01-01 09:00:06  0.624988  0.633165
2010-01-01 09:00:08  0.510470  0.573201
and .aggregate type operations.

In [87]: r.agg({'A' : 'mean', 'B' : 'sum'})
Out[87]: 
                            A         B
2010-01-01 09:00:00  0.485748  0.894701
2010-01-01 09:00:02  0.820801  1.588635
2010-01-01 09:00:04  0.433985  0.629165
2010-01-01 09:00:06  0.624988  1.219477
2010-01-01 09:00:08  0.510470  1.068634
These accessors can of course, be combined

In [88]: r[['A','B']].agg(['mean','sum'])
Out[88]: 
                            A                   B          
                         mean       sum      mean       sum
2010-01-01 09:00:00  0.485748  0.971495  0.447351  0.894701
2010-01-01 09:00:02  0.820801  1.641602  0.794317  1.588635
2010-01-01 09:00:04  0.433985  0.867969  0.314582  0.629165
2010-01-01 09:00:06  0.624988  1.249976  0.609738  1.219477
2010-01-01 09:00:08  0.510470  1.020940  0.534317  1.068634

Conclusion:

结论:

you can check well documented .groupby()examples to get an impression what can be done after resampling (with resampled DF/Series)

您可以检查记录良好的.groupby()示例以了解重采样后可以做什么(使用重采样的 DF/系列)