基于索引（时间序列）合并 Pandas 行

Question

提问by Jason

I used Pandas.append()to add columns from a number of Pandastimeseriesby their index (date). However, instead of combining all data from common dates into one row, the data looks like this:

我曾经通过索引（日期）Pandas.append()从多个列中添加列Pandastimeseries。但是，不是将来自共同日期的所有数据合并为一行，而是如下所示：

sve2_all.sort(inplace=True)
print sve2_all['20000101':'20000104']



Hgtot ng/l     Q l/s  DOC_mg/L  Flow_mm/day  MeHg ng/l Site  \
2000-01-01          NaN       NaN       NaN         0.18        NaN  NaN   
2000-01-01          NaN  0.613234       NaN          NaN        NaN  SVE   
2000-01-02          NaN       NaN       NaN         0.18        NaN  NaN   
2000-01-02          NaN  0.614410       NaN          NaN        NaN  SVE   
2000-01-03          NaN       NaN       NaN          NaN        NaN    2   
2000-01-03          NaN  0.617371       NaN          NaN        NaN  SVE   
2000-01-03          NaN       NaN       NaN          NaN        NaN  NaN   
2000-01-03          NaN       NaN       NaN         0.18        NaN  NaN   
2000-01-04          NaN  0.627733       NaN          NaN        NaN  SVE   
2000-01-04          NaN       NaN       NaN         0.18        NaN  NaN   

            TOC_filt.TOC  TOC_unfilt.TOC  Temp oC  pH  
2000-01-01           NaN             NaN      NaN NaN  
2000-01-01           NaN             NaN -12.6117 NaN  
2000-01-02           NaN             NaN      NaN NaN  
2000-01-02           NaN             NaN  -2.3901 NaN  
2000-01-03           NaN        8.224648      NaN NaN  
2000-01-03           NaN             NaN  -5.0064 NaN  
2000-01-03           NaN             NaN      NaN NaN  
2000-01-03           NaN             NaN      NaN NaN  
2000-01-04           NaN             NaN  -1.5868 NaN  
2000-01-04           NaN             NaN      NaN NaN  

[10 rows x 10 columns]

I've tried to resample this data by day using:

我尝试每天使用以下方法重新采样这些数据：

sve2_all.resample('D', how='mean')

And also to group by day using:

并且还可以使用以下方式按天分组：

sve2_all.groupby(sve2_all.index.map(lambda t: t.day))

However, the DataFrameremains unchanged. How can I collapse the rows for the same date into one date? Thanks.

然而，DataFrame不变。如何将同一日期的行折叠为一个日期？谢谢。

Additional information:I tried using pd.concat()as suggested by Joris (I had to pass 0 as the axis argument as 1 resulted in ValueError:cannot reindex from a duplicate axis) instead of .append()but the resulting DataFrameis the same as with .append(), a non-uniform non-monotonic time series. I think the index is the problem but I'm not sure what I can do to fix it, I thought that some time stamps might contain hour information while other not so I tried I've also tried using .resample('D',how='mean')on each DataFramebefore using .concat()but it didn't make a difference.

附加信息：我尝试pd.concat()按照 Joris 的建议使用（我必须将 0 作为轴参数传递，因为 1 导致ValueError:cannot reindex from a duplicate axis）而不是.append()结果DataFrame与 with 相同.append()，这是一个非均匀非单调时间序列。我认为索引是问题，但我不确定我能做些什么来解决它，我认为某些时间戳可能包含小时信息，而另一些则没有，所以我尝试过在使用之前我也尝试.resample('D',how='mean')在每个时间戳上DataFrame使用，.concat()但它没有没什么区别。

Solution:Joris solution was correct, I didn't realise that .resample()wasn't inplace. Once the .resample()was assigned to a new DataFrameJoris' suggestion provided the desired result.

解决方案：Joris 解决方案是正确的，我没有意识到这.resample()不是到位的。一旦.resample()分配给新的DataFrameJoris 建议，就会提供所需的结果。

Answer 1

回答by joris

The appendmethod does 'append' the rows to the other dataframe, and does not merge with it based on the index labels. For that you can use concat

该append方法确实将行“附加”到另一个数据帧，并且不会根据索引标签与其合并。为此，您可以使用concat

Using a toy example:

使用玩具示例：

In [14]: df1 = pd.DataFrame(np.random.randn(3,2), columns=list('AB'), index=pd.date_range('2000-01-01', periods=3))
In [15]: df1
Out[15]:
                   A         B
2000-01-01  1.532085 -1.338895
2000-01-02 -0.016784 -0.270698
2000-01-03 -1.680379  0.838287

In [16]: df2 = pd.DataFrame(np.random.randn(3,2), columns=list('CD'), index=pd.date_range('2000-01-01', periods=3))
In [17]: df2
Out[17]:
                   C         D
2000-01-01  0.375214 -0.812558
2000-01-02 -1.099848 -0.889941
2000-01-03  1.556383  0.870608

.appendwill append the rows (and columns of df2that are not in df1will be added, which is the case here):

.append将追加行（并且将添加df2不在df1其中的列，这里就是这种情况）：

In [18]: df1.append(df2)
Out[18]:
                   A         B         C         D
2000-01-01  1.532085 -1.338895       NaN       NaN
2000-01-02 -0.016784 -0.270698       NaN       NaN
2000-01-03 -1.680379  0.838287       NaN       NaN
2000-01-01       NaN       NaN  0.375214 -0.812558
2000-01-02       NaN       NaN -1.099848 -0.889941
2000-01-03       NaN       NaN  1.556383  0.870608

pd.concat()concatenates the both dataframes along one of the index axises:

pd.concat()沿索引轴之一连接两个数据帧：

In [19]: pd.concat([df1, df2], axis=1)
Out[19]:
                   A         B         C         D
2000-01-01  1.532085 -1.338895  0.375214 -0.812558
2000-01-02 -0.016784 -0.270698 -1.099848 -0.889941
2000-01-03 -1.680379  0.838287  1.556383  0.870608

Apart from that, the resampleshould normally work.

除此之外，resample应该正常工作。

基于索引（时间序列）合并 Pandas 行

提问by Jason

回答by joris

相关推荐

最近更新

标签

基于索引（时间序列）合并 Pandas 行

提问by Jason

回答by joris

相关推荐

仅在 Pandas 中保留有限条目

pandas 使用pandas读取JSON文件进行Python分析

如何在前瞻性的基础上使用 Pandas 滚动_* 函数

pandas 使用行熊猫 python 上的部分字符串匹配返回 DataFrame 项目

相关推荐

最近更新

标签