Python Pandas 日均值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23162472/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas daily average
提问by mercergeoinfo
I'm having problems getting the daily average in a Pandas database. I've checked here Calculating daily average from irregular time series using pandasand it doesn't help. csv files look like this:
我在 Pandas 数据库中获取每日平均值时遇到问题。我在这里检查了使用Pandas计算不规则时间序列的每日平均值,但它没有帮助。.csv 文件如下所示:
Date/Time,Value
12/08/13 12:00:01,5.553
12/08/13 12:30:01,2.604
12/08/13 13:00:01,2.604
12/08/13 13:30:01,2.604
12/08/13 14:00:01,2.101
12/08/13 14:30:01,2.666
and so on. My code looks like this:
等等。我的代码如下所示:
# Import iButton temperatures
flistloc = '../data/iButtons/Readings/edit'
flist = os.listdir(flistloc)
# Create empty dictionary to store db for each file
pdib = {}
for file in flist:
file = os.path.join(flistloc,file)
# Calls function to return only name
fname,_,_,_= namer(file)
# Read each file to db
pdib[fname] = pd.read_csv(file, parse_dates=0, dayfirst=True, index_col=0)
pdibkeys = sorted(pdib.keys())
#
# Calculate daily average for each iButton
for name in pdibkeys:
pdib[name]['daily'] = pdib[name].resample('D', how = 'mean')```
The database seems ok but the averaging doesn't work. Here is what one looks like in iPython:
数据库似乎没问题,但求平均值不起作用。这是在 iPython 中的样子:
'2B5DE4': <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1601 entries, 2013-08-12 12:00:01 to 2013-09-14 20:00:01
Data columns (total 2 columns):
Value 1601 non-null values
daily 0 non-null values
dtypes: float64(2)}
Anyone know what's going on?
有谁知道这是怎么回事?
回答by Sebastian
The question is somewhat old, but i want to contribute anyway since i had to deal with this over and over again (and i think it's not really pythonic...).
这个问题有点老了,但无论如何我都想做出贡献,因为我不得不一遍又一遍地处理这个问题(我认为这不是真正的 Pythonic ......)。
The best solution, i have come up so far is to use the original index to create a new dataframe with mostly NA and fill it up at the end.
到目前为止,我提出的最佳解决方案是使用原始索引创建一个主要为 NA 的新数据框,并在最后填充它。
davg = df.resample('D', how='mean')
davg_NA = davg.loc[df.index]
davg_daily = davg_NA.fillna(method='ffill')
One can even cramp this in one line
人们甚至可以把它挤在一条线上
df.resample('D', how='mean').loc[df.index].fillna(method='ffill')
回答by exp1orer
When you call resampleon your 1 column dataframe, the output is going to be a 1 column dataframe with a different index-- with each date as its own index entry. So when you try and assign it to a column in your original dataframe, I don't know what you expect to happen.
当您调用resample1 列数据框时,输出将是具有不同索引的 1 列数据框——每个日期作为其自己的索引条目。因此,当您尝试将其分配给原始数据框中的一列时,我不知道您期望发生什么。
Three possible approaches (where dfis your original dataframe):
三种可能的方法(df您的原始数据框在哪里):
Do you actually need the average values in your original dataframe? If not:
davg = df.resample('D', how='mean')If you do, a different solution is to merge the two dataframes on the date, after making sure that both have a column (not the index) with the date.
您真的需要原始数据框中的平均值吗?如果不:
davg = df.resample('D', how='mean')如果这样做,另一种解决方案是在日期合并两个数据框,然后确保两者都有带有日期的列(不是索引)。
'
'
davg = df.resample('D', how='mean')
df['day'] = df.index.apply(lambda x: x.date())
davg.reset_index('Date/Time', inplace=True)
df = pandas.merge(df, davg, left_on='day',right_on='Date/Time')
An alternate to 2 (no intuition about whether it's faster) is to simply
groupbythe date.def compute_avg_val(df): df['daily average'] = df['Value'].mean() return df df['day'] = df.index.apply(lambda x: x.date()) grouped = df.groupby('day') df = grouped.apply(compute_avg_val)
2(不知道它是否更快)的替代方法是简单
groupby的日期。def compute_avg_val(df): df['daily average'] = df['Value'].mean() return df df['day'] = df.index.apply(lambda x: x.date()) grouped = df.groupby('day') df = grouped.apply(compute_avg_val)
回答by Phillip Cloud
You can't resample at a lower frequency and then assign the resampled DataFrameor Seriesback into the one you resampled from, because the indices don't match:
您不能以较低的频率重新采样,然后将重新采样的DataFrame或Series重新分配回您重新采样的频率,因为索引不匹配:
In [49]: df = pd.read_csv(StringIO("""Date/Time,Value
12/08/13 12:00:01,5.553
12/08/13 12:30:01,2.604
12/08/13 13:00:01,2.604
12/08/13 13:30:01,2.604
12/08/13 14:00:01,2.101
12/08/13 14:30:01,2.666"""), parse_dates=0, dayfirst=True, index_col=0)
In [50]: df.resample('D')
Out[50]:
Value
Date/Time
2013-08-12 3.022
[1 rows x 1 columns]
In [51]: df['daily'] = df.resample('D')
In [52]: df
Out[52]:
Value daily
Date/Time
2013-08-12 12:00:01 5.553 NaN
2013-08-12 12:30:01 2.604 NaN
2013-08-12 13:00:01 2.604 NaN
2013-08-12 13:30:01 2.604 NaN
2013-08-12 14:00:01 2.101 NaN
2013-08-12 14:30:01 2.666 NaN
[6 rows x 2 columns]
One option is to take advantage of partial time indexing on the rows:
一种选择是利用行上的部分时间索引:
davg = df.resample('D', how='mean')
df.loc[str(davg.index.date[0]), 'daily'] = davg.values
which looks like this, when you expand the str(davg.index.date[0])line:
看起来像这样,当您展开该str(davg.index.date[0])行时:
df.loc['2013-08-12', 'daily'] = davg.values
This is a bit of hack, there might be a better way to do it.
这有点黑客,可能有更好的方法来做到这一点。

