Python Pandas 日均值

Question

提问by mercergeoinfo

I'm having problems getting the daily average in a Pandas database. I've checked here Calculating daily average from irregular time series using pandasand it doesn't help. csv files look like this:

我在 Pandas 数据库中获取每日平均值时遇到问题。我在这里检查了使用Pandas计算不规则时间序列的每日平均值，但它没有帮助。.csv 文件如下所示：

Date/Time,Value
12/08/13 12:00:01,5.553
12/08/13 12:30:01,2.604
12/08/13 13:00:01,2.604
12/08/13 13:30:01,2.604
12/08/13 14:00:01,2.101
12/08/13 14:30:01,2.666

and so on. My code looks like this:

等等。我的代码如下所示：

# Import iButton temperatures
flistloc = '../data/iButtons/Readings/edit'
flist = os.listdir(flistloc)
# Create empty dictionary to store db for each file
pdib = {}
for file in flist:
    file = os.path.join(flistloc,file)
    # Calls function to return only name
    fname,_,_,_= namer(file)
    # Read each file to db
    pdib[fname] = pd.read_csv(file, parse_dates=0, dayfirst=True, index_col=0)
pdibkeys = sorted(pdib.keys())
#
# Calculate daily average for each iButton
for name in pdibkeys:
    pdib[name]['daily'] = pdib[name].resample('D', how = 'mean')```

The database seems ok but the averaging doesn't work. Here is what one looks like in iPython:

数据库似乎没问题，但求平均值不起作用。这是在 iPython 中的样子：

'2B5DE4': <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1601 entries, 2013-08-12 12:00:01 to 2013-09-14 20:00:01
Data columns (total 2 columns):
Value    1601  non-null values
daily    0  non-null values
dtypes: float64(2)}

Anyone know what's going on?

有谁知道这是怎么回事？

Answer 1

回答by Sebastian

The question is somewhat old, but i want to contribute anyway since i had to deal with this over and over again (and i think it's not really pythonic...).

这个问题有点老了，但无论如何我都想做出贡献，因为我不得不一遍又一遍地处理这个问题（我认为这不是真正的 Pythonic ......）。

The best solution, i have come up so far is to use the original index to create a new dataframe with mostly NA and fill it up at the end.

到目前为止，我提出的最佳解决方案是使用原始索引创建一个主要为 NA 的新数据框，并在最后填充它。

davg = df.resample('D', how='mean')
davg_NA = davg.loc[df.index]
davg_daily = davg_NA.fillna(method='ffill')

One can even cramp this in one line

人们甚至可以把它挤在一条线上

df.resample('D', how='mean').loc[df.index].fillna(method='ffill')

Answer 2

回答by exp1orer

When you call resampleon your 1 column dataframe, the output is going to be a 1 column dataframe with a different index-- with each date as its own index entry. So when you try and assign it to a column in your original dataframe, I don't know what you expect to happen.

当您调用resample1 列数据框时，输出将是具有不同索引的 1 列数据框——每个日期作为其自己的索引条目。因此，当您尝试将其分配给原始数据框中的一列时，我不知道您期望发生什么。

Three possible approaches (where dfis your original dataframe):

三种可能的方法（df您的原始数据框在哪里）：

Do you actually need the average values in your original dataframe? If not:
davg = df.resample('D', how='mean')
If you do, a different solution is to merge the two dataframes on the date, after making sure that both have a column (not the index) with the date.

您真的需要原始数据框中的平均值吗？如果不：
davg = df.resample('D', how='mean')
如果这样做，另一种解决方案是在日期合并两个数据框，然后确保两者都有带有日期的列（不是索引）。

'

davg = df.resample('D', how='mean')
df['day'] = df.index.apply(lambda x: x.date()) 
davg.reset_index('Date/Time', inplace=True)
df = pandas.merge(df, davg, left_on='day',right_on='Date/Time')

An alternate to 2 (no intuition about whether it's faster) is to simply groupbythe date.

def compute_avg_val(df):
    df['daily average'] = df['Value'].mean()
    return df
df['day'] = df.index.apply(lambda x: x.date())
grouped = df.groupby('day')
df = grouped.apply(compute_avg_val)

2（不知道它是否更快）的替代方法是简单groupby的日期。

def compute_avg_val(df):
    df['daily average'] = df['Value'].mean()
    return df
df['day'] = df.index.apply(lambda x: x.date())
grouped = df.groupby('day')
df = grouped.apply(compute_avg_val)

Answer 3

回答by Phillip Cloud

You can't resample at a lower frequency and then assign the resampled DataFrameor Seriesback into the one you resampled from, because the indices don't match:

您不能以较低的频率重新采样，然后将重新采样的DataFrame或Series重新分配回您重新采样的频率，因为索引不匹配：

In [49]: df = pd.read_csv(StringIO("""Date/Time,Value
12/08/13 12:00:01,5.553
12/08/13 12:30:01,2.604
12/08/13 13:00:01,2.604
12/08/13 13:30:01,2.604
12/08/13 14:00:01,2.101
12/08/13 14:30:01,2.666"""), parse_dates=0, dayfirst=True, index_col=0)

In [50]: df.resample('D')
Out[50]:
            Value
Date/Time
2013-08-12  3.022

[1 rows x 1 columns]

In [51]: df['daily'] = df.resample('D')

In [52]: df
Out[52]:
                     Value  daily
Date/Time
2013-08-12 12:00:01  5.553    NaN
2013-08-12 12:30:01  2.604    NaN
2013-08-12 13:00:01  2.604    NaN
2013-08-12 13:30:01  2.604    NaN
2013-08-12 14:00:01  2.101    NaN
2013-08-12 14:30:01  2.666    NaN

[6 rows x 2 columns]

One option is to take advantage of partial time indexing on the rows:

一种选择是利用行上的部分时间索引：

davg = df.resample('D', how='mean')
df.loc[str(davg.index.date[0]), 'daily'] = davg.values

which looks like this, when you expand the str(davg.index.date[0])line:

看起来像这样，当您展开该str(davg.index.date[0])行时：

df.loc['2013-08-12', 'daily'] = davg.values

This is a bit of hack, there might be a better way to do it.

这有点黑客，可能有更好的方法来做到这一点。

Python Pandas 日均值

提问by mercergeoinfo

回答by Sebastian

回答by exp1orer

回答by Phillip Cloud

相关推荐

最近更新

标签

Python Pandas 日均值

提问by mercergeoinfo

回答by Sebastian

回答by exp1orer

回答by Phillip Cloud

相关推荐

pandas 使用行熊猫 python 上的部分字符串匹配返回 DataFrame 项目

pandas 如何有效地删除python中数据帧或csv文件中的所有重复项？

pandas IPython - 有打印默认打印头和尾长变量

Pandas 错误“***ValueError：长度不匹配：预期轴有 0 个元素，新值有……”

相关推荐

最近更新

标签