即使在带有“left”选项的“pandas.merge”之后,行数也会发生变化
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37095161/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Number of rows changes even after `pandas.merge` with `left` option
提问by user51966
I am merging two data frames using pandas.merge
. Even after specifying how = left
option, I found the number of rows of merged data frame is larger than the original. Why does this happen?
我正在使用pandas.merge
. 即使在指定how = left
选项后,我发现合并数据帧的行数比原始数据大。为什么会发生这种情况?
panel = pd.read_csv(file1, encoding ='cp932')
before_len = len(panel)
prof_2000 = pd.read_csv(file2, encoding ='cp932').drop_duplicates()
temp_2000 = pd.merge(panel, prof_2000, left_on='Candidate_u', right_on="name2", how="left")
after_len = len(temp_2000)
print(before_len, after_len)
> 12661 13915
回答by Thanos
This sounds like having more than one rows in right
under 'name2'
that match the key you have set for the left
. Using option 'how='left'
with pandas.DataFrame.merge()
only means that:
这听起来像是right
在'name2'
与您为left
. 使用选项'how='left'
with pandas.DataFrame.merge()
only 意味着:
- left: use only keys from left frame
- 左:仅使用左框架中的键
However, the actual number of rows in the result object is not necessarily going to be the same as the number of rows in the left
object.
但是,结果对象中的实际行数不一定与left
对象中的行数相同。
Example:
例子:
In [359]: df_1
Out[359]:
A B
0 a AAA
1 b BBA
2 c CCF
and then another DF that looks like this (notice that there are more than one entry for your desired key on the left):
然后是另一个看起来像这样的 DF(请注意,左侧有多个用于您所需键的条目):
In [360]: df_3
Out[360]:
key value
0 a 1
1 a 2
2 b 3
3 a 4
If I merge these two on left.A
, here's what happens:
如果我将这两个合并到 上left.A
,会发生以下情况:
In [361]: df_1.merge(df_3, how='left', left_on='A', right_on='key')
Out[361]:
A B key value
0 a AAA a 1.0
1 a AAA a 2.0
2 a AAA a 4.0
3 b BBA b 3.0
4 c CCF NaN NaN
This happened even though I merged with how='left'
as you can see above, there were simply more than one rows to merge and as shown here the result pd.DataFrame
has in fact more rows than the pd.DataFrame
on the left
.
这甚至发生了,虽然我与合并how='left'
,你可以在上面看到,有简单的一个以上的行合并,如下图所示的结果pd.DataFrame
实际上已经比更多的行pd.DataFrame
上left
。
I hope this helps!
我希望这有帮助!
回答by mirekphd
The problem of doubling of rows after each merge()
(of any type, 'both' or 'left') is usually caused by duplicates in any of the keys, so we need to drop them first:
每个merge()
(任何类型,'both' 或 'left')之后的行加倍的问题通常是由任何键中的重复引起的,因此我们需要先删除它们:
left_df.drop_duplicates(subset=left_key, inplace=True)
right_df.drop_duplicates(subset=right_key, inplace=True)