Python 在 2 个以上的熊猫数据框中联合
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34673581/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Union in more than 2 pandas dataframe
提问by User1090
I am trying to convert a sql query to python. The sql statement is as follows:
我正在尝试将 sql 查询转换为 python。sql语句如下:
select * from table 1
union
select * from table 2
union
select * from table 3
union
select * from table 4
Now I have those tables in 4 dataframe df1, df2, df3, df4
and I would like to union 4 pandas dataframe which would match the result as the same as sql query.
I am confused of what operation to be used which is equivalent to sql union?
Thanks in advance!!
现在我在 4 个数据框中有这些表,df1, df2, df3, df4
我想联合 4 个 Pandas 数据框,它们将与 sql 查询一样匹配结果。我对使用什么操作等同于 sql union 感到困惑?提前致谢!!
Note: The column name for all the dataframes are the same.
注意:所有数据框的列名都是相同的。
回答by Grégtheitroade G.
If I understand well the issue, you are looking for the concat
function.
如果我很好地理解了这个问题,那么您正在寻找该concat
功能。
pandas.concat([df1, df2, df3, df4])
should work correctly if the column names are the same for both dataframes.
pandas.concat([df1, df2, df3, df4])
如果两个数据框的列名相同,则应该可以正常工作。
回答by jezrael
IIUC you can use merge
and join by columns matching_col
of all dataframes:
IIUC 您可以merge
按matching_col
所有数据框的列使用和加入:
import pandas as pd
# Merge multiple dataframes
df1 = pd.DataFrame({"matching_col": pd.Series({1: 4, 2: 5, 3: 7}),
"a": pd.Series({1: 52, 2: 42, 3:7})}, columns=['matching_col','a'])
print df1
matching_col a
1 4 52
2 5 42
3 7 7
df2 = pd.DataFrame({"matching_col": pd.Series({1: 2, 2: 7, 3: 8}),
"a": pd.Series({1: 62, 2: 28, 3:9})}, columns=['matching_col','a'])
print df2
matching_col a
1 2 62
2 7 28
3 8 9
df3 = pd.DataFrame({"matching_col": pd.Series({1: 1, 2: 0, 3: 7}),
"a": pd.Series({1: 28, 2: 52, 3:3})}, columns=['matching_col','a'])
print df3
matching_col a
1 1 28
2 0 52
3 7 3
df4 = pd.DataFrame({"matching_col": pd.Series({1: 4, 2: 9, 3: 7}),
"a": pd.Series({1: 27, 2: 24, 3:7})}, columns=['matching_col','a'])
print df4
matching_col a
1 4 27
2 9 24
3 7 7
Solution1:
解决方案1:
df = pd.merge(pd.merge(pd.merge(df1,df2,on='matching_col'),df3,on='matching_col'), df4, on='matching_col')
set columns names
df.columns = ['matching_col','a1','a2','a3','a4']
print df
matching_col a1 a2 a3 a4
0 7 7 28 3 7
Solution2:
解决方案2:
dfs = [df1, df2, df3, df4]
#use built-in python reduce
df = reduce(lambda left,right: pd.merge(left,right,on='matching_col'), dfs)
#set columns names
df.columns = ['matching_col','a1','a2','a3','a4']
print df
matching_col a1 a2 a3 a4
0 7 7 28 3 7
But if you need only concat dataframes, use concat
with reseting index by parameter ignore_index=True
:
但是,如果您只需要连接数据帧,请使用concat
按参数重置索引ignore_index=True
:
print pd.concat([df1, df2, df3, df4], ignore_index=True)
matching_col a
0 4 52
1 5 42
2 7 7
3 2 62
4 7 28
5 8 9
6 1 28
7 0 52
8 7 3
9 4 27
10 9 24
11 7 7
回答by majr
This should be a comment on Jezrael's answer (+1'd for merge
over concat
) but I haven't sufficient reputation.
这应该是对 Jezrael 回答的评论(+1'd for merge
over concat
),但我没有足够的声誉。
The OP asked how to union
the dfs, but merge
returns intersection
by default:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.merge.html#pandas.merge
OP 询问如何union
使用 dfs,但默认merge
返回intersection
:http:
//pandas.pydata.org/pandas-docs/stable/generated/pandas.merge.html#pandas.merge
To get union
s, add how='outer'
to the merge
calls.
要获得union
s,请添加how='outer'
到merge
调用中。