Pandas:使用多索引数据进行透视
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19421380/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Pivoting with multi-index data
提问by Brendon McLean
I have two dataframes which looks like this:
我有两个数据框,看起来像这样:
rating
BMW Fiat Toyota
0 7 2 3
1 8 1 8
2 9 10 7
3 8 3 9
own
BMW Fiat Toyota
0 1 1 0
1 0 1 1
2 0 0 1
3 0 1 1
I'm ultimately trying to get a pivot table of mean ratingfor usageby brand. Or something like this:
我最终想获得的数据透视表的平均得分为使用的品牌。或者像这样:
BMW Fiat Toyota
Usage
0 8.333333 10 3
1 7.000000 2 8
My approach was to merge the datasets like this:
我的方法是像这样合并数据集:
Measure Rating Own
Brand BMW Fiat Toyota BMW Fiat Toyota
0 7 2 3 1 1 0
1 8 1 8 0 1 1
2 9 10 7 0 0 1
3 8 3 9 0 1 1
And then attempt to create a pivot table using ratingas the value, ownas the rows and brandas the columns. But I kept running to key issues. I have also attempted unstacking either the measure or brand levels, but I can't seem to use row index names as pivot keys.
然后尝试创建一个使用评级作为值、拥有作为行和品牌作为列的数据透视表。但我一直在跑到关键问题上。我还尝试对度量或品牌级别进行拆分,但我似乎无法将行索引名称用作枢轴键。
What am I doing wrong? Is there a better approach to this?
我究竟做错了什么?有没有更好的方法来解决这个问题?
采纳答案by Roman Pekar
I'm not an expert in Pandas, so the solution may be more clumsy than you want, but:
我不是 Pandas 的专家,所以解决方案可能比你想要的更笨拙,但是:
rating = pd.DataFrame({"BMW":[7, 8, 9, 8], "Fiat":[2, 1, 10, 3], "Toyota":[3, 8, 7,9]})
own = pd.DataFrame({"BMW":[1, 0, 0, 0], "Fiat":[1, 1, 0, 1], "Toyota":[0, 1, 1, 1]})
r = rating.unstack().reset_index(name='value')
o = own.unstack().reset_index(name='value')
res = DataFrame({"Brand":r["level_0"], "Rating": r["value"], "Own": o["value"]})
res = res.groupby(["Own", "Brand"]).mean().reset_index()
res.pivot(index="Own", columns="Brand", values="Rating")
# result
# Brand BMW Fiat Toyota
# Own
# 0 8.333333 10 3
# 1 7.000000 2 8
another solution, although not very much generalizable (you can use for loop, but you have to know which values do you have in owndataframe):
另一个解决方案,虽然不是很普遍(您可以使用 for 循环,但您必须知道own数据帧中有哪些值):
d = []
for o in (0, 1):
t = rating[own == o]
t["own"] = o
d.append(t)
res = pd.concat(d).groupby("own").mean()
回答by Brendon McLean
I have a new answer to my own question (based on Roman's initial answer). The key is to get the index at the required dimensionality. For example
我对自己的问题有了新的答案(基于 Roman 的初始答案)。关键是获取所需维度的索引。例如
rating.columns.names = ["Brand"]
rating.index.names = ["n"]
print rating
Brand BMW Fiat Toyota
n
0 7 2 3
1 8 1 8
2 9 10 7
3 8 3 9
own.columns.names = ["Brand"]
own.index.names = ["n"]
print own
Brand BMW Fiat Toyota
n
0 1 1 0
1 0 1 1
2 0 0 1
3 0 1 1
merged = pd.merge(own.unstack().reset_index(name="Own"),
rating.unstack().reset_index(name="Rating"))
print merged
Brand n Own Rating
0 BMW 0 1 7
1 BMW 1 0 8
2 BMW 2 0 9
3 BMW 3 0 8
4 Fiat 0 1 2
5 Fiat 1 1 1
6 Fiat 2 0 10
7 Fiat 3 1 3
8 Toyota 0 0 3
9 Toyota 1 1 8
10 Toyota 2 1 7
11 Toyota 3 1 9
Then it's easy to use the pivot_tablecommand to turn this into the desired result:
然后很容易使用pivot_table命令将其转换为所需的结果:
print merged.pivot_table(rows="Brand", cols="Own", values="Rating")
Own 0 1
Brand
BMW 8.333333 7
Fiat 10.000000 2
Toyota 3.000000 8
And that is what I was looking for. Thanks again to Roman for pointing the way.
这就是我一直在寻找的。再次感谢罗曼指路。

