熊猫日均值,pandas.resample

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39603399/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:03:20  来源:igfitidea点击:

pandas daily average, pandas.resample

pythonpandasmeannumericresampling

提问by OGMGIC

I have a csv file similar to this

我有一个与此类似的 csv 文件

Date,Temp1,Temp2

23-Oct-09 01:00:00,21.1,22.3

23-Oct-09 04:00:00,22.3,23.8

23-Oct-09 07:00:00,21.4,21.3

23-Oct-09 10:00:00,21.5,21.6

23-Oct-09 13:00:00,22.3,23.8

23-Oct-09 16:00:00,21.4,21.3

23-Oct-09 19:00:00,21.1,22.3

23-Oct-09 22:00:00,21.4,21.3

24-Oct-09 01:00:00,22.3,23.8

24-Oct-09 04:00:00,22.3,23.8

24-Oct-09 07:00:00,21.1,22.3

24-Oct-09 10:00:00,22.3,23.8

24-Oct-09 13:00:00,21.1,22.3

24-Oct-09 16:00:00,22.3,23.8

24-Oct-09 19:00:00,21.1,22.3

24-Oct-09 22:00:00,22.3,23.8

I have read the data with:

我已经阅读了以下数据:

df=pd.read_csv('data.csv', index_col=0)

and converted the index to date time

并将索引转换为日期时间

df.index=pd.to_datetime(df.index)

Now I want to take the mean of each daily temperature, I have been trying to use pd.resample as below, but have been receiving errors. I've read the pandas.resample docs and numerous examples on here and am still at a loss...

现在我想取每个每日温度的平均值,我一直在尝试使用 pd.resample 如下,但一直收到错误。我已经阅读了 pandas.resample 文档和这里的大量示例,但仍然不知所措......

df_avg = df.resample('D', how = 'mean')

DataError: No numeric types to aggregate

DataError:没有要聚合的数字类型

I would like df_avg to be a dataframe with a datetime index and the two 2 columns. I am using pandas 0.17.1 and python 3.5.2, any help greatly appreciated!

我希望 df_avg 是一个带有日期时间索引和两个 2 列的数据框。我正在使用 Pandas 0.17.1 和 python 3.5.2,非常感谢任何帮助!

回答by jezrael

You need convert stringcolumns to floatfirst:

您需要先将string列转换为float

#add parameter parse_dates for convert to datetime first column
df=pd.read_csv('data.csv', index_col=0, parse_dates=[0])

df['Temp1'] = df.Temp1.astype(float)
df['Temp2'] = df.Temp2.astype(float)

df_avg = df.resample('D').mean()


If astypereturn error, problem is there are some non numeric values. So you need use to_numericwith errors='coerce'- then all 'problematic' values are converted to NaN:

如果astypereturn error,问题是有一些非数字值。所以你需要使用to_numericwith errors='coerce'- 然后所有“有问题”的值都转换为NaN

df['Temp1'] = pd.to_numeric(df.Temp1, errors='coerce')
df['Temp2'] = pd.to_numeric(df.Temp2, errors='coerce')

You can also check all rows with problematic values with boolean indexing:

您还可以使用以下命令检查具有问题值的所有行boolean indexing

print df[pd.to_numeric(df.Temp1, errors='coerce').isnull()]
print df[pd.to_numeric(df.Temp2, errors='coerce').isnull()]