根据 Pandas 中的常见列值合并两个数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43297589/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merge two data frames based on common column values in Pandas
提问by Harry_pb
How to get merged data frame from two data frames having common column value such that only those rows make merged data frame having common value in a particular column.
如何从具有共同列值的两个数据帧中获取合并数据帧,以便只有那些行使合并数据帧在特定列中具有共同值。
I have 5000 rows of df1
as format : -
我有 5000 行df1
as 格式:-
director_name actor_1_name actor_2_name actor_3_name movie_title
0 James Cameron CCH Pounder Joel David Moore Wes Studi Avatar
1 Gore Verbinski Johnny Depp Orlando Bloom Hyman Davenport Pirates
of the Caribbean: At World's End
2 Sam Mendes Christoph Waltz Rory Kinnear Stephanie Sigman Spectre
and 10000 rows of df2
as
和 10000 行df2
as
movieId genres movie_title
1 Adventure|Animation|Children|Comedy|Fantasy Toy Story
2 Adventure|Children|Fantasy Jumanji
3 Comedy|Romance Grumpier Old Men
4 Comedy|Drama|Romance Waiting to Exhale
A common column 'movie_title' have common values and based on them, I want to get all rows where 'movie_title' is same. Other rows to be deleted.
一个共同的列'movie_title'具有共同的值,并且基于它们,我想获得'movie_title'相同的所有行。要删除的其他行。
Any help/suggestion would be appreciated.
任何帮助/建议将不胜感激。
Note: I already tried
注意:我已经试过了
pd.merge(dfinal, df1, on='movie_title')
and output comes like one row
输出就像一行
director_name actor_1_name actor_2_name actor_3_name movie_title movieId title genres
and on how ="outer"/"left", "right", I tried all and didn't get any row after dropping NaN although many common coloumn do exist.
以及如何 ="outer"/"left"、"right",我尝试了所有方法,但在删除 NaN 后没有得到任何行,尽管确实存在许多常见的列。
采纳答案by Harry_pb
We can merge two Data frames in several ways. Most common way in python is using merge operation in Pandas.
我们可以通过多种方式合并两个数据框。python 中最常见的方法是在 Pandas 中使用合并操作。
import pandas
dfinal = df1.merge(df2, on="movie_title", how = 'inner')
For merging based on columns of different dataframe, you may specify left and right common column names specially in case of ambiguity of two different names of same column, lets say - 'movie_title'
as 'movie_name'
.
对于基于不同数据帧的列进行合并,您可以特别指定左右公共列名称,以防同一列的两个不同名称有歧义,例如 - 'movie_title'
as 'movie_name'
。
dfinal = df1.merge(df2, how='inner', left_on='movie_title', right_on='movie_name')
If you want to be even more specific, you may read the documentation of pandas merge
operation.
如果你想更具体,你可以阅读 pandasmerge
操作的文档。
回答by pansen
You can use pd.merge
:
您可以使用pd.merge
:
import pandas as pd
pd.merge(df1, df2, on="movie_title")
Only rows are kept for which common keys are found in both data frames. In case you want to keep all rows from the left data frame and only add values from df2
where a matching key is available, you can use how="left"
.
仅保留在两个数据帧中都找到公共键的行。如果您想保留左侧数据框中的所有行,并且只从df2
匹配键可用的地方添加值,您可以使用how="left"
.
回答by gaurav singh
If you want to merge two dataframes and you want a merged data frame in which only common values from both data frames will appear then do inner merge.
如果您想合并两个数据框并且您想要一个合并的数据框,其中只会出现来自两个数据框的公共值,然后进行内部合并。
import pandas as pd
merged_Frame = pd.merge(df1,df2, on = id,how=inner)