pandas 熊猫交叉加入没有共同的列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35265613/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:39:31  来源:igfitidea点击:

pandas cross join no columns in common

pythonpandas

提问by ostrokach

How would you perform a full outer joina cross joinof two dataframes with no columns in common using pandas?

您将如何使用 Pandas执行完全外连接和两个数据框的交叉连接,而没有共同的列?

In MySQL, you can simply do:

在 MySQL 中,您可以简单地执行以下操作:

SELECT *
FROM table_1
[CROSS] JOIN table_2;

But in pandas, doing:

但是在Pandas中,这样做:

df_1.merge(df_2, how='outer')

gives an error:

给出一个错误:

MergeError: No common columns to perform merge on


The best solution I have so far is using sqlite:

到目前为止我最好的解决方案是使用sqlite

import sqlalchemy as sa
engine = sa.create_engine('sqlite:///tmp.db')
df_1.to_sql('df_1', engine)
df_2.to_sql('df_2', engine)
df = pd.read_sql_query('SELECT * FROM df_1 JOIN df_2', engine)

回答by jezrael

IIUC you need mergewith temporary columns tmpof both DataFrames:

IIUC 你需要两者的merge临时列:tmpDataFrames

import pandas as pd

df1 = pd.DataFrame({'fld1': ['x', 'y'],
                'fld2': ['a', 'b1']})


df2 = pd.DataFrame({'fld3': ['y', 'x', 'y'],
                'fld4': ['a', 'b1', 'c2']})

print df1
  fld1 fld2
0    x    a
1    y   b1

print df2
  fld3 fld4
0    y    a
1    x   b1
2    y   c2

df1['tmp'] = 1
df2['tmp'] = 1

df = pd.merge(df1, df2, on=['tmp'])
df = df.drop('tmp', axis=1)
print df
  fld1 fld2 fld3 fld4
0    x    a    y    a
1    x    a    x   b1
2    x    a    y   c2
3    y   b1    y    a
4    y   b1    x   b1
5    y   b1    y   c2

回答by Istvan

Even in MySQL you have to specify which fields are you joining on.

即使在 MySQL 中,您也必须指定要加入的字段。

http://dev.mysql.com/doc/refman/5.7/en/join.html

http://dev.mysql.com/doc/refman/5.7/en/join.html

Example:

例子:

SELECT * FROM t1 LEFT JOIN t2 ON (t1.a = t2.a);

Same concept with Pandas:

与 Pandas 相同的概念:

Parameters: 
right : DataFrame
how : {‘left', ‘right', ‘outer', ‘inner'}, default ‘inner'
left: use only keys from left frame (SQL: left outer join)
right: use only keys from right frame (SQL: right outer join)
outer: use union of keys from both frames (SQL: full outer join)
inner: use intersection of keys from both frames (SQL: inner join)
on : label or list
Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
left_on : label or list, or array-like
Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
right_on : label or list, or array-like
Field names to join on in right DataFrame or vector/list of vectors per left_on docs

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html