Python 从具有相似索引的其他 DataFrame 的列创建一个 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21231834/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating a pandas DataFrame from columns of other DataFrames with similar indexes
提问by user3153467
I have 2 DataFrames df1 and df2 with the same column names ['a','b','c'] and indexed by dates. The date index can have similar values. I would like to create a DataFrame df3 with only the data from columns ['c'] renamed respectively 'df1' and 'df2' and with the correct date index. My problem is that I cannot get how to merge the index properly.
我有 2 个 DataFrames df1 和 df2 具有相同的列名 ['a','b','c'] 并按日期索引。日期索引可以具有相似的值。我想创建一个 DataFrame df3,其中只有列 ['c'] 中的数据分别重命名为 'df1' 和 'df2' 并具有正确的日期索引。我的问题是我无法正确合并索引。
df1 = pd.DataFrame(np.random.randn(5,3), index=pd.date_range('01/02/2014',periods=5,freq='D'), columns=['a','b','c'] )
df2 = pd.DataFrame(np.random.randn(8,3), index=pd.date_range('01/01/2014',periods=8,freq='D'), columns=['a','b','c'] )
df1
a b c
2014-01-02 0.580550 0.480814 1.135899
2014-01-03 -1.961033 0.546013 1.093204
2014-01-04 2.063441 -0.627297 2.035373
2014-01-05 0.319570 0.058588 0.350060
2014-01-06 1.318068 -0.802209 -0.939962
df2
a b c
2014-01-01 0.772482 0.899337 0.808630
2014-01-02 0.518431 -1.582113 0.323425
2014-01-03 0.112109 1.056705 -1.355067
2014-01-04 0.767257 -2.311014 0.340701
2014-01-05 0.794281 -1.954858 0.200922
2014-01-06 0.156088 0.718658 -1.030077
2014-01-07 1.621059 0.106656 -0.472080
2014-01-08 -2.061138 -2.023157 0.257151
The df3 DataFrame should have the following form :
df3 DataFrame 应具有以下形式:
df3
df1 df2
2014-01-01 NaN 0.808630
2014-01-02 1.135899 0.323425
2014-01-03 1.093204 -1.355067
2014-01-04 2.035373 0.340701
2014-01-05 0.350060 0.200922
2014-01-06 -0.939962 -1.030077
2014-01-07 NaN -0.472080
2014-01-08 NaN 0.257151
But with NaN in the df1 column as the date index of df2 is wider. (In this example, I would get NaN for the ollowing dates : 2014-01-01, 2014-01-07 and 2014-01-08)
但是在 df1 列中使用 NaN 作为 df2 的日期索引更宽。(在这个例子中,我会得到NaN的为ollowing日期:2014-01-01, 2014-01-07 and 2014-01-08)
Thanks for your help.
谢谢你的帮助。
采纳答案by Andy Hayden
You can use concat:
您可以使用concat:
In [11]: pd.concat([df1['c'], df2['c']], axis=1, keys=['df1', 'df2'])
Out[11]:
df1 df2
2014-01-01 NaN -0.978535
2014-01-02 -0.106510 -0.519239
2014-01-03 -0.846100 -0.313153
2014-01-04 -0.014253 -1.040702
2014-01-05 0.315156 -0.329967
2014-01-06 -0.510577 -0.940901
2014-01-07 NaN -0.024608
2014-01-08 NaN -1.791899
[8 rows x 2 columns]
The axis argument determines the way the DataFrames are stacked:
axis 参数决定了 DataFrame 的堆叠方式:
df1 = pd.DataFrame([1, 2, 3])
df2 = pd.DataFrame(['a', 'b', 'c'])
pd.concat([df1, df2], axis=0)
0
0 1
1 2
2 3
0 a
1 b
2 c
pd.concat([df1, df2], axis=1)
0 0
0 1 a
1 2 b
2 3 c
回答by Woody Pride
Well, I'm not sure that merge would be the way to go. Personally I would build a new data frame by creating an index of the dates and then constructing the columns using list comprehensions. Possibly not the most pythonic way, but it seems to work for me!
好吧,我不确定合并会是要走的路。就我个人而言,我会通过创建日期索引然后使用列表推导构建列来构建一个新的数据框。可能不是最 Pythonic 的方式,但它似乎对我有用!
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(5,3), index=pd.date_range('01/02/2014',periods=5,freq='D'), columns=['a','b','c'] )
df2 = pd.DataFrame(np.random.randn(8,3), index=pd.date_range('01/01/2014',periods=8,freq='D'), columns=['a','b','c'] )
# Create an index list from the set of dates in both data frames
Index = list(set(list(df1.index) + list(df2.index)))
Index.sort()
df3 = pd.DataFrame({'df1': [df1.loc[Date, 'c'] if Date in df1.index else np.nan for Date in Index],\
'df2': [df2.loc[Date, 'c'] if Date in df2.index else np.nan for Date in Index],},\
index = Index)
df3
回答by Markus Dutschke
What you ask for is the joinoperation.
With the howargument, you can define how unique indices are handled.
Here, some article, which looks helpful concerning this point.
In the example below, I left out cosmetics (like renaming columns) for simplicity.
您要求的是连接操作。使用how参数,您可以定义如何处理唯一索引。在这里,一些文章,看起来对这一点很有帮助。在下面的示例中,为了简单起见,我省略了化妆品(如重命名列)。
Code
代码
import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.random.randn(5,3), index=pd.date_range('01/02/2014',periods=5,freq='D'), columns=['a','b','c'] )
df2 = pd.DataFrame(np.random.randn(8,3), index=pd.date_range('01/01/2014',periods=8,freq='D'), columns=['a','b','c'] )
df3 = df1.join(df2, how='outer', lsuffix='_df1', rsuffix='_df2')
print(df3)
Output
输出
a_df1 b_df1 c_df1 a_df2 b_df2 c_df2
2014-01-01 NaN NaN NaN 0.109898 1.107033 -1.045376
2014-01-02 0.573754 0.169476 -0.580504 -0.664921 -0.364891 -1.215334
2014-01-03 -0.766361 -0.739894 -1.096252 0.962381 -0.860382 -0.703269
2014-01-04 0.083959 -0.123795 -1.405974 1.825832 -0.580343 0.923202
2014-01-05 1.019080 -0.086650 0.126950 -0.021402 -1.686640 0.870779
2014-01-06 -1.036227 -1.103963 -0.821523 -0.943848 -0.905348 0.430739
2014-01-07 NaN NaN NaN 0.312005 0.586585 1.531492
2014-01-08 NaN NaN NaN -0.077951 -1.189960 0.995123

