pandas - 具有非数字值的pivot_table?(数据错误:没有要聚合的数字类型)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19279229/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:13:54  来源:igfitidea点击:

pandas - pivot_table with non-numeric values? (DataError: No numeric types to aggregate)

pythonpandaspivot-tabledataframe

提问by Pawe? Rumian

I'm trying to do a pivot of a table containing strings as results.

我正在尝试对包含字符串作为结果的表进行数据透视。

import pandas as pd

df1 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': ["on","off","off","on","on","off","off","on"]})

df1.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])

But I get: DataError: No numeric types to aggregate.

但我得到:DataError: No numeric types to aggregate

This works as intended when I change result values to numbers:

当我将结果值更改为数字时,这按预期工作:

df2 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': [1,0,0,1,1,0,0,1]})

df2.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])

And I get what I need:

我得到了我需要的东西:

variable1   A               B    
variable2   a       b       a   b
variable3   x   y   x   y   x   y
index                            
0           1 NaN NaN NaN NaN NaN
1         NaN NaN   0 NaN NaN NaN
2         NaN NaN NaN NaN   0 NaN
3         NaN NaN NaN NaN NaN   1
4         NaN   1 NaN NaN NaN NaN
5         NaN NaN NaN NaN NaN   0
6         NaN NaN NaN NaN   0 NaN
7         NaN NaN NaN   1 NaN NaN

I know I can map the strings to numerical values and then reverse the operation, but maybe there is a more elegant solution?

我知道我可以将字符串映射到数值然后反转操作,但也许有更优雅的解决方案?

回答by Randall Goodwin

My original reply was based on Pandas 0.14.1, and since then, many things changed in the pivot_table function (rows --> index, cols --> columns... )

我的原始回复基于 Pandas 0.14.1,从那时起,pivot_table 函数中的许多内容发生了变化(行 --> 索引、列 --> 列...)

Additionally, it appears that the original lambda trick I posted no longer works on Pandas 0.18. You have to provide a reducing function (even if it is min, max or mean). But even that seemed improper - because we are not reducing the data set, just transforming it.... So I looked harder at unstack...

此外,我发布的原始 lambda 技巧似乎不再适用于 Pandas 0.18。您必须提供一个归约函数(即使它是最小值、最大值或平均值)。但即使这样看起来也不合适 - 因为我们没有减少数据集,只是转换它......所以我更仔细地看着 unstack......

import pandas as pd

df1 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': ["on","off","off","on","on","off","off","on"]})

# these are the columns to end up in the multi-index columns.
unstack_cols = ['variable1', 'variable2', 'variable3']

First, set an index on the data using the index + the columns you want to stack, then call unstack using the level arg.

首先,使用索引 + 要堆叠的列在数据上设置索引,然后使用级别 arg 调用 unstack。

df1.set_index(['index'] + unstack_cols).unstack(level=unstack_cols)

Resulting dataframe is below.

结果数据框如下。

enter image description here

在此处输入图片说明

回答by Dan Allan

I think the best compromise is to replace on/off with True/False, which will enable pandas to "understand" the data better and act in an intelligent, expected way.

我认为最好的折衷办法是用 True/False 替换开/关,这将使Pandas能够更好地“理解”数据并以智能的、预期的方式行事。

df2 = df1.replace({'on': True, 'off': False})

You essentially conceded this in your question. My answer is, I don't think there's a better way, and you should replace 'on'/'off' anyway for whatever comes next.

您在问题中基本上承认了这一点。我的回答是,我认为没有更好的方法,无论如何你都应该替换 'on'/'off' 来代替接下来发生的任何事情。

As Andy Hayden points out in the comments, you'll get better performance if you replace on/off with 1/0.

正如安迪·海登 (Andy Hayden) 在评论中指出的那样,如果将开/关替换为 1/0,您将获得更好的性能。