Python 在 2 个以上的熊猫数据框中联合

Question

提问by User1090

I am trying to convert a sql query to python. The sql statement is as follows:

我正在尝试将 sql 查询转换为 python。sql语句如下：

select * from table 1 
union
select * from table 2
union 
select * from table 3
union
select * from table 4

Now I have those tables in 4 dataframe df1, df2, df3, df4and I would like to union 4 pandas dataframe which would match the result as the same as sql query. I am confused of what operation to be used which is equivalent to sql union? Thanks in advance!!

现在我在 4 个数据框中有这些表，df1, df2, df3, df4我想联合 4 个 Pandas 数据框，它们将与 sql 查询一样匹配结果。我对使用什么操作等同于 sql union 感到困惑？提前致谢！！

Note: The column name for all the dataframes are the same.

注意：所有数据框的列名都是相同的。

Answer 1

回答by Grégtheitroade G.

If I understand well the issue, you are looking for the concatfunction.

如果我很好地理解了这个问题，那么您正在寻找该concat功能。

pandas.concat([df1, df2, df3, df4])should work correctly if the column names are the same for both dataframes.

pandas.concat([df1, df2, df3, df4])如果两个数据框的列名相同，则应该可以正常工作。

Answer 2

回答by jezrael

IIUC you can use mergeand join by columns matching_colof all dataframes:

IIUC 您可以merge按matching_col所有数据框的列使用和加入：

import pandas as pd

# Merge multiple dataframes
df1 = pd.DataFrame({"matching_col": pd.Series({1: 4, 2: 5, 3: 7}), 
                    "a": pd.Series({1: 52, 2: 42, 3:7})}, columns=['matching_col','a'])
print df1
   matching_col   a
1             4  52
2             5  42
3             7   7

df2 = pd.DataFrame({"matching_col": pd.Series({1: 2, 2: 7, 3: 8}), 
                    "a": pd.Series({1: 62, 2: 28, 3:9})}, columns=['matching_col','a'])
print df2
   matching_col   a
1             2  62
2             7  28
3             8   9

df3 = pd.DataFrame({"matching_col": pd.Series({1: 1, 2: 0, 3: 7}), 
                    "a": pd.Series({1: 28, 2: 52, 3:3})}, columns=['matching_col','a'])
print df3
   matching_col   a
1             1  28
2             0  52
3             7   3

df4 = pd.DataFrame({"matching_col": pd.Series({1: 4, 2: 9, 3: 7}), 
                    "a": pd.Series({1: 27, 2: 24, 3:7})}, columns=['matching_col','a'])
print df4
   matching_col   a
1             4  27
2             9  24
3             7   7

Solution1:

解决方案1：

df = pd.merge(pd.merge(pd.merge(df1,df2,on='matching_col'),df3,on='matching_col'), df4, on='matching_col')
set columns names
df.columns = ['matching_col','a1','a2','a3','a4']
print df

   matching_col  a1  a2  a3  a4
0             7   7  28   3   7

Solution2:

解决方案2：

dfs = [df1, df2, df3, df4]
#use built-in python reduce
df = reduce(lambda left,right: pd.merge(left,right,on='matching_col'), dfs)
#set columns names
df.columns = ['matching_col','a1','a2','a3','a4']
print df

   matching_col  a1  a2  a3  a4
0             7   7  28   3   7

But if you need only concat dataframes, use concatwith reseting index by parameter ignore_index=True:

但是，如果您只需要连接数据帧，请使用concat按参数重置索引ignore_index=True：

print pd.concat([df1, df2, df3, df4], ignore_index=True)

    matching_col   a
0              4  52
1              5  42
2              7   7
3              2  62
4              7  28
5              8   9
6              1  28
7              0  52
8              7   3
9              4  27
10             9  24
11             7   7

Answer 3

回答by majr

This should be a comment on Jezrael's answer (+1'd for mergeover concat) but I haven't sufficient reputation.

这应该是对 Jezrael 回答的评论（+1'd for mergeover concat），但我没有足够的声誉。

The OP asked how to unionthe dfs, but mergereturns intersectionby default: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.merge.html#pandas.merge

OP 询问如何union使用 dfs，但默认merge返回intersection：http: //pandas.pydata.org/pandas-docs/stable/generated/pandas.merge.html#pandas.merge

To get unions, add how='outer'to the mergecalls.

要获得unions，请添加how='outer'到merge调用中。

Python 在 2 个以上的熊猫数据框中联合

提问by User1090

回答by Grégtheitroade G.

回答by jezrael

回答by majr

相关推荐

最近更新

标签

Python 在 2 个以上的熊猫数据框中联合

提问by User1090

回答by Grégtheitroade G.

回答by jezrael

回答by majr

相关推荐

Python Django 的 collectstatic 有什么意义？

Python：如何使用 OpenCV 在单击时从网络摄像头捕获图像

每 10 秒运行一次 Python 脚本

Python 的 json.load(sys.stdin) 让我 u'...' 而不是字符串周围的双引号

相关推荐

最近更新

标签