即使在带有“left”选项的“pandas.merge”之后,行数也会发生变化

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37095161/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:11:52  来源:igfitidea点击:

Number of rows changes even after `pandas.merge` with `left` option

pythonpandas

提问by user51966

I am merging two data frames using pandas.merge. Even after specifying how = leftoption, I found the number of rows of merged data frame is larger than the original. Why does this happen?

我正在使用pandas.merge. 即使在指定how = left选项后,我发现合并数据帧的行数比原始数据大。为什么会发生这种情况?

panel = pd.read_csv(file1, encoding ='cp932')
before_len = len(panel)

prof_2000 = pd.read_csv(file2, encoding ='cp932').drop_duplicates()

temp_2000 = pd.merge(panel, prof_2000, left_on='Candidate_u', right_on="name2", how="left")

after_len =  len(temp_2000)
print(before_len, after_len)
> 12661 13915

回答by Thanos

This sounds like having more than one rows in rightunder 'name2'that match the key you have set for the left. Using option 'how='left'with pandas.DataFrame.merge()only means that:

这听起来像是right'name2'与您为left. 使用选项'how='left'with pandas.DataFrame.merge()only 意味着:

  • left: use only keys from left frame
  • 左:仅使用左框架中的键

However, the actual number of rows in the result object is not necessarily going to be the same as the number of rows in the leftobject.

但是,结果对象中的实际行数不一定与left对象中的行数相同。

Example:

例子:

In [359]: df_1
Out[359]: 
   A    B
0  a  AAA
1  b  BBA
2  c  CCF

and then another DF that looks like this (notice that there are more than one entry for your desired key on the left):

然后是另一个看起来像这样的 DF(请注意,左侧有多个用于您所需键的条目):

In [360]: df_3
Out[360]: 
  key  value
0   a      1
1   a      2
2   b      3
3   a      4

If I merge these two on left.A, here's what happens:

如果我将这两个合并到 上left.A,会发生以下情况:

In [361]: df_1.merge(df_3, how='left', left_on='A', right_on='key')
Out[361]: 
   A    B  key  value
0  a  AAA    a    1.0
1  a  AAA    a    2.0
2  a  AAA    a    4.0
3  b  BBA    b    3.0
4  c  CCF  NaN    NaN

This happened even though I merged with how='left'as you can see above, there were simply more than one rows to merge and as shown here the result pd.DataFramehas in fact more rows than the pd.DataFrameon the left.

这甚至发生了,虽然我与合并how='left',你可以在上面看到,有简单的一个以上的行合并,如下图所示的结果pd.DataFrame实际上已经比更多的行pd.DataFrameleft

I hope this helps!

我希望这有帮助!

回答by mirekphd

The problem of doubling of rows after each merge()(of any type, 'both' or 'left') is usually caused by duplicates in any of the keys, so we need to drop them first:

每个merge()(任何类型,'both' 或 'left')之后的行加倍的问题通常是由任何键中的重复引起的,因此我们需要先删除它们:

left_df.drop_duplicates(subset=left_key, inplace=True)
right_df.drop_duplicates(subset=right_key, inplace=True)