基于索引(时间序列)合并 Pandas 行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23124220/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merging Pandas rows based on index (time series)
提问by Jason
I used Pandas.append()to add columns from a number of Pandastimeseriesby their index (date). However, instead of combining all data from common dates into one row, the data looks like this:
我曾经通过索引(日期)Pandas.append()从多个列中添加列Pandastimeseries。但是,不是将来自共同日期的所有数据合并为一行,而是如下所示:
sve2_all.sort(inplace=True)
print sve2_all['20000101':'20000104']
Hgtot ng/l Q l/s DOC_mg/L Flow_mm/day MeHg ng/l Site \
2000-01-01 NaN NaN NaN 0.18 NaN NaN
2000-01-01 NaN 0.613234 NaN NaN NaN SVE
2000-01-02 NaN NaN NaN 0.18 NaN NaN
2000-01-02 NaN 0.614410 NaN NaN NaN SVE
2000-01-03 NaN NaN NaN NaN NaN 2
2000-01-03 NaN 0.617371 NaN NaN NaN SVE
2000-01-03 NaN NaN NaN NaN NaN NaN
2000-01-03 NaN NaN NaN 0.18 NaN NaN
2000-01-04 NaN 0.627733 NaN NaN NaN SVE
2000-01-04 NaN NaN NaN 0.18 NaN NaN
TOC_filt.TOC TOC_unfilt.TOC Temp oC pH
2000-01-01 NaN NaN NaN NaN
2000-01-01 NaN NaN -12.6117 NaN
2000-01-02 NaN NaN NaN NaN
2000-01-02 NaN NaN -2.3901 NaN
2000-01-03 NaN 8.224648 NaN NaN
2000-01-03 NaN NaN -5.0064 NaN
2000-01-03 NaN NaN NaN NaN
2000-01-03 NaN NaN NaN NaN
2000-01-04 NaN NaN -1.5868 NaN
2000-01-04 NaN NaN NaN NaN
[10 rows x 10 columns]
I've tried to resample this data by day using:
我尝试每天使用以下方法重新采样这些数据:
sve2_all.resample('D', how='mean')
And also to group by day using:
并且还可以使用以下方式按天分组:
sve2_all.groupby(sve2_all.index.map(lambda t: t.day))
However, the DataFrameremains unchanged. How can I collapse the rows for the same date into one date? Thanks.
然而,DataFrame不变。如何将同一日期的行折叠为一个日期?谢谢。
Additional information:I tried using pd.concat()as suggested by Joris (I had to pass 0 as the axis argument as 1 resulted in ValueError:cannot reindex from a duplicate axis) instead of .append()but the resulting DataFrameis the same as with .append(), a non-uniform non-monotonic time series. I think the index is the problem but I'm not sure what I can do to fix it, I thought that some time stamps might contain hour information while other not so I tried I've also tried using .resample('D',how='mean')on each DataFramebefore using .concat()but it didn't make a difference.
附加信息:我尝试pd.concat()按照 Joris 的建议使用(我必须将 0 作为轴参数传递,因为 1 导致ValueError:cannot reindex from a duplicate axis)而不是.append()结果DataFrame与 with 相同.append(),这是一个非均匀非单调时间序列。我认为索引是问题,但我不确定我能做些什么来解决它,我认为某些时间戳可能包含小时信息,而另一些则没有,所以我尝试过在使用之前我也尝试.resample('D',how='mean')在每个时间戳上DataFrame使用,.concat()但它没有没什么区别。
Solution:Joris solution was correct, I didn't realise that .resample()wasn't inplace. Once the .resample()was assigned to a new DataFrameJoris' suggestion provided the desired result.
解决方案:Joris 解决方案是正确的,我没有意识到这.resample()不是到位的。一旦.resample()分配给新的DataFrameJoris 建议,就会提供所需的结果。
回答by joris
The appendmethod does 'append' the rows to the other dataframe, and does not merge with it based on the index labels. For that you can use concat
该append方法确实将行“附加”到另一个数据帧,并且不会根据索引标签与其合并。为此,您可以使用concat
Using a toy example:
使用玩具示例:
In [14]: df1 = pd.DataFrame(np.random.randn(3,2), columns=list('AB'), index=pd.date_range('2000-01-01', periods=3))
In [15]: df1
Out[15]:
A B
2000-01-01 1.532085 -1.338895
2000-01-02 -0.016784 -0.270698
2000-01-03 -1.680379 0.838287
In [16]: df2 = pd.DataFrame(np.random.randn(3,2), columns=list('CD'), index=pd.date_range('2000-01-01', periods=3))
In [17]: df2
Out[17]:
C D
2000-01-01 0.375214 -0.812558
2000-01-02 -1.099848 -0.889941
2000-01-03 1.556383 0.870608
.appendwill append the rows (and columns of df2that are not in df1will be added, which is the case here):
.append将追加行(并且将添加df2不在df1其中的列,这里就是这种情况):
In [18]: df1.append(df2)
Out[18]:
A B C D
2000-01-01 1.532085 -1.338895 NaN NaN
2000-01-02 -0.016784 -0.270698 NaN NaN
2000-01-03 -1.680379 0.838287 NaN NaN
2000-01-01 NaN NaN 0.375214 -0.812558
2000-01-02 NaN NaN -1.099848 -0.889941
2000-01-03 NaN NaN 1.556383 0.870608
pd.concat()concatenates the both dataframes along one of the index axises:
pd.concat()沿索引轴之一连接两个数据帧:
In [19]: pd.concat([df1, df2], axis=1)
Out[19]:
A B C D
2000-01-01 1.532085 -1.338895 0.375214 -0.812558
2000-01-02 -0.016784 -0.270698 -1.099848 -0.889941
2000-01-03 -1.680379 0.838287 1.556383 0.870608
Apart from that, the resampleshould normally work.
除此之外,resample应该正常工作。

