Python pivot_table 没有要聚合的数字类型
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39229005/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pivot_table No numeric types to aggregate
提问by DougKruger
I want to make a pivot table from the following dataframe with columns sales
, rep
. The pivot table shows sales
but no rep
. When I tried with only rep
, I got the error DataError: No numeric types to aggregate
. How to fix this such that I see both the numeric field sales
and the field(string) rep
我想从以下数据框中创建一个数据透视表,其中包含列sales
, rep
。数据透视表显示sales
但没有rep
。当我尝试使用 only 时rep
,出现错误DataError: No numeric types to aggregate
。如何解决这个问题,以便我看到数字字段sales
和字段(字符串)rep
data = {'year': ['2016', '2016', '2015', '2014', '2013'],
'country':['uk', 'usa', 'fr','fr','uk'],
'sales': [10, 21, 20, 10,12],
'rep': ['john', 'john', 'claire', 'kyle','kyle']
}
print pd.DataFrame(data).pivot_table(index='country', columns='year', values=['rep','sales'])
sales
year 2013 2014 2015 2016
country
fr NaN 10 20 NaN
uk 12 NaN NaN 10
usa NaN NaN NaN 21
print pd.DataFrame(data).pivot_table(index='country', columns='year', values=['rep'])
DataError: No numeric types to aggregate
回答by unutbu
You could use set_index
and unstack
:
你可以使用set_index
和unstack
:
df = pd.DataFrame(data)
df.set_index(['year','country']).unstack('year')
yields
产量
rep sales
year 2013 2014 2015 2016 2013 2014 2015 2016
country
fr None kyle claire None NaN 10.0 20.0 NaN
uk kyle None None john 12.0 NaN NaN 10.0
usa None None None john NaN NaN NaN 21.0
Or, using pivot_table
with aggfunc='first'
:
或者,使用pivot_table
with aggfunc='first'
:
df.pivot_table(index='country', columns='year', values=['rep','sales'], aggfunc='first')
yields
产量
rep sales
year 2013 2014 2015 2016 2013 2014 2015 2016
country
fr None kyle claire None None 10 20 None
uk kyle None None john 12 None None 10
usa None None None john None None None 21
With aggfunc='first'
, each (country, year, rep)
or (country, year, sales)
group is aggregrated by taking the first value found. In your case there appears to be no duplicates, so the first value is the same as the only value.
使用aggfunc='first'
,
通过获取找到的第一个值来聚合每个(country, year, rep)
或(country, year, sales)
组。在您的情况下,似乎没有重复项,因此第一个值与唯一值相同。
回答by Psidom
It seems that the problem comes from the different types for column rep and sales, if you convert the sales to str
type and specify the aggfunc as sum
, it works fine:
似乎问题来自列代表和销售的不同类型,如果您将销售转换为str
type 并将 aggfunc 指定为sum
,则可以正常工作:
df.sales = df.sales.astype(str)
pd.pivot_table(df, index=['country'], columns=['year'], values=['rep', 'sales'], aggfunc='sum')
# rep sales
# year 2013 2014 2015 2016 2013 2014 2015 2016
# country
# fr None kyle claire None None 10 20 None
# uk kyle None None john 12 None None 10
#usa None None None john None None None 21