使用 Python Pandas 使用每日数据的月平均值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29762546/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:14:11  来源:igfitidea点击:

Monthly Averages Using Daily Data Using Python Pandas

pythonpandastime-series

提问by DJV

I'm a Python user but a rookie in terms of using pandas. I'm hoping to use it more as I'm getting into working with a lot of time series and I've heard they're a whole lot easier to modify with pandas. I've read through some of the tutorials but they have yet to make sense. Hoping you can help me out with an example.

我是 Python 用户,但在使用 Pandas 方面是个菜鸟。我希望更多地使用它,因为我正在处理大量时间序列,而且我听说用 Pandas 修改它们要容易得多。我已经阅读了一些教程,但它们仍然没有意义。希望你能帮我举个例子。

I have a text file with four columns: year, month, day and snow depth. This is daily data for a 30-year period, 1979-2009. I would like to calculate 360 (30yrs X 12 months) individual monthly averages using pandas techniques (i.e. isolating all the values for Jan-1979, Feb-1979,... Dec-2009 and averaging each). Can anyone help me out with some example code?

我有一个包含四列的文本文件:年、月、日和雪深。这是 1979-2009 年 30 年间的每日数据。我想使用 Pandas 技术计算 360(30 年 X 12 个月)个人月平均值(即隔离 1979 年 1 月、1979 年 2 月、... 2009 年 12 月的所有值并取平均值)。谁能帮我一些示例代码?

Thanks.

谢谢。

1979    1   1   3
1979    1   2   3
1979    1   3   3
1979    1   4   3
1979    1   5   3
1979    1   6   3
1979    1   7   4
1979    1   8   5
1979    1   9   7
1979    1   10  8
1979    1   11  16
1979    1   12  16
1979    1   13  16
1979    1   14  18
1979    1   15  18
1979    1   16  18
1979    1   17  18
1979    1   18  20
1979    1   19  20
1979    1   20  20
1979    1   21  20
1979    1   22  20
1979    1   23  18
1979    1   24  18
1979    1   25  18
1979    1   26  18
1979    1   27  18
1979    1   28  18
1979    1   29  18
1979    1   30  18
1979    1   31  19
1979    2   1   19
1979    2   2   19
1979    2   3   19
1979    2   4   19
1979    2   5   19
1979    2   6   22
1979    2   7   24
1979    2   8   27
1979    2   9   29
1979    2   10  32
1979    2   11  32
1979    2   12  32
1979    2   13  32
1979    2   14  33
1979    2   15  33
1979    2   16  33
1979    2   17  34
1979    2   18  36
1979    2   19  36
1979    2   20  36
1979    2   21  36
1979    2   22  36
1979    2   23  36
1979    2   24  31
1979    2   25  29
1979    2   26  27
1979    2   27  27
1979    2   28  27

采纳答案by Zachary Cross

You'll want to group your data by year and month, and then calculate the mean of each group. Pseudo-code:

您需要按年和月对数据进行分组,然后计算每组的平均值。伪代码:

import numpy as np
import pandas as pd

# Read in your file as a pandas.DataFrame
# using 'any number of whitespace' as the seperator
df = pd.read_csv("snow.txt", sep='\s*', names=["year", "month", "day", "snow_depth"])

# Show the first 5 rows of the DataFrame
print df.head()

# Group data first by year, then by month
g = df.groupby(["year", "month"])

# For each group, calculate the average of only the snow_depth column
monthly_averages = g.aggregate({"snow_depth":np.mean})

For more, about the split-apply-combine approach in Pandas, read here.

有关 Pandas 中拆分-应用-组合方法的更多信息,请阅读此处

A DataFrameis a:

数据是:

"Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)."

“二维大小可变,具有标记轴(行和列)的潜在异构表格数据结构。”

For your purposes, the difference between a numpy ndarrayand a DataFrameare not too significant, but DataFrames have a bunch of functions that will make your life easier, so I'd suggest doing some reading on them.

就您的目的而言,numpyndarray和 a之间的区别DataFrame并不是很明显,但 DataFrames 有很多功能可以让您的生活更轻松,因此我建议您阅读一下它们。