Python 如何并排合并两个数据帧?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23891575/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to merge two dataframes side-by-side?
提问by James Bond
is there a way to conveniently merge two data frames side by side?
有没有办法方便地并排合并两个数据帧?
both two data frames have 30 rows, they have different number of columns, say, df1 has 20 columns and df2 has 40 columns.
两个数据框都有 30 行,它们的列数不同,例如,df1 有 20 列,df2 有 40 列。
how can i easily get a new data frame of 30 rows and 60 columns?
如何轻松获得 30 行 60 列的新数据框?
df3 = pd.someSpecialMergeFunct(df1, df2)
or maybe there is some special parameter in append
或者可能附加了一些特殊参数
df3 = pd.append(df1, df2, left_index=False, right_index=false, how='left')
ps: if possible, i hope the replicated column names could be resolved automatically.
ps:如果可能,我希望可以自动解析复制的列名。
thanks!
谢谢!
回答by joris
You can use the concat
function for this (axis=1
is to concatenate as columns):
您可以concat
为此使用该函数(axis=1
连接为列):
pd.concat([df1, df2], axis=1)
See the pandas docs on merging/concatenating: http://pandas.pydata.org/pandas-docs/stable/merging.html
请参阅有关合并/连接的熊猫文档:http: //pandas.pydata.org/pandas-docs/stable/merging.html
回答by Hyman
I came across your question while I was trying to achieve something like the following:
我在尝试实现以下目标时遇到了您的问题:
So once I sliced my dataframes, I first ensured that their index are the same. In your case both dataframes needs to be indexed from 0 to 29. Then merged both dataframes by the index.
因此,一旦我对数据帧进行切片,我首先确保它们的索引相同。在您的情况下,两个数据帧都需要从 0 到 29 进行索引。然后按索引合并两个数据帧。
df1.reset_index(drop=True).merge(df2.reset_index(drop=True), left_index=True, right_index=True)
回答by Rohit Madan
- There is way, you can do it via a Pipeline.
- 有办法,您可以通过管道来完成。
** Use a pipeline to transform your numerical Data for ex-
** 使用管道来转换您的数值数据,例如
Num_pipeline = Pipeline
([("select_numeric", DataFrameSelector([columns with numerical value])),
("imputer", SimpleImputer(strategy="median")),
])
**And for categorical data
**对于分类数据
cat_pipeline = Pipeline([
("select_cat", DataFrameSelector([columns with categorical data])),
("cat_encoder", OneHotEncoder(sparse=False)),
])
** Then use a Feature union to add these transformations together
** 然后使用 Feature union 将这些转换加在一起
preprocess_pipeline = FeatureUnion(transformer_list=[
("num_pipeline", num_pipeline),
("cat_pipeline", cat_pipeline),
])