Pandas:难以理解合并的工作原理
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10145224/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: trouble understanding how merge works
提问by Rafael S. Calsaverini
I'm doing something wrong with merge and I can't understand what it is. I've done the following to estimate a histogram of a series of integer values:
我在合并时做错了,我无法理解它是什么。我已经做了以下来估计一系列整数值的直方图:
import pandas as pnd
import numpy as np
series = pnd.Series(np.random.poisson(5, size = 100))
tmp = {"series" : series, "count" : np.ones(len(series))}
hist = pnd.DataFrame(tmp).groupby("series").sum()
freq = (hist / hist.sum()).rename(columns = {"count" : "freq"})
If I print histand freqthis is what I get:
如果我打印hist,freq这就是我得到的:
> print hist
count
series
0 2
1 4
2 13
3 15
4 12
5 16
6 18
7 7
8 8
9 3
10 1
11 1
> print freq
freq
series
0 0.02
1 0.04
2 0.13
3 0.15
4 0.12
5 0.16
6 0.18
7 0.07
8 0.08
9 0.03
10 0.01
11 0.01
They're both indexed by "series"but if I try to merge:
它们都被索引,"series"但如果我尝试合并:
> df = pnd.merge(freq, hist, on = "series")
I get a KeyError: 'no item named series'exception. If I omit on = "series"I get a IndexError: list index out of rangeexception.
我得到一个KeyError: 'no item named series'例外。如果我省略,on = "series"我会得到一个IndexError: list index out of range例外。
I don't get what I'm doing wrong. May be "series" is an index and not a column so I must do it differently?
我不明白我做错了什么。可能“系列”是一个索引而不是一个列,所以我必须以不同的方式来做?
回答by Avaris
From docs:
从文档:
on: Columns (names) to join on. Must be found in both the left and right DataFrame objects. If not passed and left_index and right_index are False, the intersection of the columns in the DataFrames will be inferred to be the join keys
on:要加入的列(名称)。必须在左右 DataFrame 对象中都能找到。如果未通过且 left_index 和 right_index 为 False,则 DataFrames 中列的交集将被推断为连接键
I don't know why this is not in the docstring, but it explains your problem.
我不知道为什么这不在文档字符串中,但它解释了您的问题。
You can either give left_indexand right_index:
你可以给left_index和right_index:
In : pnd.merge(freq, hist, right_index=True, left_index=True)
Out:
freq count
series
0 0.01 1
1 0.04 4
2 0.14 14
3 0.12 12
4 0.21 21
5 0.14 14
6 0.17 17
7 0.07 7
8 0.05 5
9 0.01 1
10 0.01 1
11 0.03 3
Or you can make your index a column and use on:
或者您可以将索引设为一列并使用on:
In : freq2 = freq.reset_index()
In : hist2 = hist.reset_index()
In : pnd.merge(freq2, hist2, on='series')
Out:
series freq count
0 0 0.01 1
1 1 0.04 4
2 2 0.14 14
3 3 0.12 12
4 4 0.21 21
5 5 0.14 14
6 6 0.17 17
7 7 0.07 7
8 8 0.05 5
9 9 0.01 1
10 10 0.01 1
11 11 0.03 3
Alternatively and more simply, DataFramehas joinmethod which does exactly what you want:
或者,更简单的是,DataFrame具有join完全符合您要求的方法:
In : freq.join(hist)
Out:
freq count
series
0 0.01 1
1 0.04 4
2 0.14 14
3 0.12 12
4 0.21 21
5 0.14 14
6 0.17 17
7 0.07 7
8 0.05 5
9 0.01 1
10 0.01 1
11 0.03 3

