具有特定列聚合功能的 Pandas df.resample
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44289526/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas df.resample with column-specific aggregation function
提问by knub
With pandas.DataFrame.resampleI can downsample a DataFrame:
使用pandas.DataFrame.resample我可以对 DataFrame 进行下采样:
df.resample("3s", how="mean")
This resamples a data frame with a datetime-like index such that all values within 3 seconds are aggregated into one row. The values of the columns are averaged.
这会使用类似日期时间的索引重新采样数据框,以便将 3 秒内的所有值聚合到一行中。列的值是平均的。
Question: I have a data frame with multiple columns. Is it possible to specify a different aggregation function for different columns, e.g. I want to "sum"
column x
, "mean"
column y
and pick the "last"
for column z
? How can I achieve that effect?
问题:我有一个包含多列的数据框。是否可以为不同的列指定不同的聚合函数,例如我想要"sum"
column x
、"mean"
columny
并选择"last"
for column z
?我怎样才能达到这种效果?
I know I could create a new empty data frame, and then call resample
three times, but I would prefer a faster in-place solution.
我知道我可以创建一个新的空数据框,然后调用resample
3 次,但我更喜欢更快的就地解决方案。
回答by Scott Boston
You can use .agg
after resample. With a dictionary, you can aggregate different columns with various functions.
您可以.agg
在重新采样后使用。使用字典,您可以聚合具有各种功能的不同列。
Try this:
尝试这个:
df.resample("3s").agg({'x':'sum','y':'mean','z':'last'})
Also, how
is deprecated:
此外,how
已弃用:
C:\Program Files\Anaconda3\lib\site-packages\ipykernel__main__.py:1: FutureWarning: how in .resample() is deprecated the new syntax is .resample(...).mean()
C:\Program Files\Anaconda3\lib\site-packages\ipykernel__main__.py:1:FutureWarning:如何在 .resample() 中弃用新语法是 .resample(...).mean()
回答by piRSquared
Consider the dataframe df
考虑数据框 df
np.random.seed([3,1415])
tidx = pd.date_range('2017-01-01', periods=18, freq='S')
df = pd.DataFrame(np.random.rand(len(tidx), 3), tidx, list('XYZ'))
print(df)
X Y Z
2017-01-01 00:00:00 0.444939 0.407554 0.460148
2017-01-01 00:00:01 0.465239 0.462691 0.016545
2017-01-01 00:00:02 0.850445 0.817744 0.777962
2017-01-01 00:00:03 0.757983 0.934829 0.831104
2017-01-01 00:00:04 0.879891 0.926879 0.721535
2017-01-01 00:00:05 0.117642 0.145906 0.199844
2017-01-01 00:00:06 0.437564 0.100702 0.278735
2017-01-01 00:00:07 0.609862 0.085823 0.836997
2017-01-01 00:00:08 0.739635 0.866059 0.691271
2017-01-01 00:00:09 0.377185 0.225146 0.435280
2017-01-01 00:00:10 0.700900 0.700946 0.796487
2017-01-01 00:00:11 0.018688 0.700566 0.900749
2017-01-01 00:00:12 0.764869 0.253200 0.548054
2017-01-01 00:00:13 0.778883 0.651676 0.136097
2017-01-01 00:00:14 0.544838 0.035073 0.275079
2017-01-01 00:00:15 0.706685 0.713614 0.776050
2017-01-01 00:00:16 0.542329 0.836541 0.538186
2017-01-01 00:00:17 0.185523 0.652151 0.746060
Use agg
用 agg
df.resample('3S').agg(dict(X='sum', Y='mean', Z='last'))
X Y Z
2017-01-01 00:00:00 1.760624 0.562663 0.777962
2017-01-01 00:00:03 1.755516 0.669204 0.199844
2017-01-01 00:00:06 1.787061 0.350861 0.691271
2017-01-01 00:00:09 1.096773 0.542220 0.900749
2017-01-01 00:00:12 2.088590 0.313316 0.275079
2017-01-01 00:00:15 1.434538 0.734102 0.746060