根据 Pandas 中的常见列值合并两个数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43297589/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 15:47:44  来源:igfitidea点击:

Merge two data frames based on common column values in Pandas

pandasdataframe

提问by Harry_pb

How to get merged data frame from two data frames having common column value such that only those rows make merged data frame having common value in a particular column.

如何从具有共同列值的两个数据帧中获取合并数据帧,以便只有那些行使合并数据帧在特定列中具有共同值。

I have 5000 rows of df1as format : -

我有 5000 行df1as 格式:-

    director_name   actor_1_name    actor_2_name    actor_3_name    movie_title
0   James Cameron   CCH Pounder Joel David Moore    Wes Studi     Avatar
1   Gore Verbinski  Johnny Depp Orlando Bloom   Hyman Davenport   Pirates 
    of the Caribbean: At World's End
2   Sam Mendes   Christoph Waltz    Rory Kinnear    Stephanie Sigman Spectre

and 10000 rows of df2as

和 10000 行df2as

movieId                   genres                        movie_title
    1       Adventure|Animation|Children|Comedy|Fantasy   Toy Story
    2       Adventure|Children|Fantasy                    Jumanji
    3       Comedy|Romance                             Grumpier Old Men
    4       Comedy|Drama|Romance                      Waiting to Exhale

A common column 'movie_title' have common values and based on them, I want to get all rows where 'movie_title' is same. Other rows to be deleted.

一个共同的列'movie_title'具有共同的值,并且基于它们,我想获得'movie_title'相同的所有行。要删除的其他行。

Any help/suggestion would be appreciated.

任何帮助/建议将不胜感激。

Note: I already tried

注意:我已经试过了

pd.merge(dfinal, df1, on='movie_title')

and output comes like one row

输出就像一行

director_name   actor_1_name    actor_2_name    actor_3_name    movie_title movieId title   genres

and on how ="outer"/"left", "right", I tried all and didn't get any row after dropping NaN although many common coloumn do exist.

以及如何 ="outer"/"left"、"right",我尝试了所有方法,但在删除 NaN 后没有得到任何行,尽管确实存在许多常见的列。

采纳答案by Harry_pb

We can merge two Data frames in several ways. Most common way in python is using merge operation in Pandas.

我们可以通过多种方式合并两个数据框。python 中最常见的方法是在 Pandas 中使用合并操作。

import pandas
dfinal = df1.merge(df2, on="movie_title", how = 'inner')

For merging based on columns of different dataframe, you may specify left and right common column names specially in case of ambiguity of two different names of same column, lets say - 'movie_title'as 'movie_name'.

对于基于不同数据帧的列进行合并,您可以特别指定左右公共列名称,以防同一列的两个不同名称有歧义,例如 - 'movie_title'as 'movie_name'

dfinal = df1.merge(df2, how='inner', left_on='movie_title', right_on='movie_name')

If you want to be even more specific, you may read the documentation of pandas mergeoperation.

如果你想更具体,你可以阅读 pandasmerge操作的文档。

回答by pansen

You can use pd.merge:

您可以使用pd.merge

import pandas as pd
pd.merge(df1, df2, on="movie_title")

Only rows are kept for which common keys are found in both data frames. In case you want to keep all rows from the left data frame and only add values from df2where a matching key is available, you can use how="left".

仅保留在两个数据帧中都找到公共键的行。如果您想保留左侧数据框中的所有行,并且只从df2匹配键可用的地方添加值,您可以使用how="left".

回答by gaurav singh

If you want to merge two dataframes and you want a merged data frame in which only common values from both data frames will appear then do inner merge.

如果您想合并两个数据框并且您想要一个合并的数据框,其中只会出现来自两个数据框的公共值,然后进行内部合并。

import pandas as pd

merged_Frame = pd.merge(df1,df2, on = id,how=inner)