pandas cut:如何将分类标签转换为字符串(否则无法导出到 Excel)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46775308/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:38:50  来源:igfitidea点击:

pandas cut: how to convert categorical labels to strings (otherwise cannot export to Excel)?

pythonpandasdataframeexport-to-excel

提问by Pythonista anonymous

I use pandas.cut() to discretise a continuous variable into a range, and then group by the result.

我使用 pandas.cut() 将连续变量离散到一个范围内,然后按结果分组。

After a lot of swearing because I couldn't figure out what was wrong, I have learnt that, if I don't supply custom labels to the cut() function, but rely on the default, then the output cannot be exported to excel. If I try this:

由于无法弄清楚出了什么问题,经过多次咒骂后,我了解到,如果我不为 cut() 函数提供自定义标签,而是依赖默认值,那么输出将无法导出到 excel . 如果我试试这个:

import pandas as pd
import numpy as np    

writer = pd.ExcelWriter('test.xlsx')
wk = writer.book.add_worksheet('Test')

df= df= pd.DataFrame(np.random.randint(1,10,(10000,5)), columns=['a','b','c','d','e'])
df['range'] = pd.cut( df['a'],[-np.inf,3,8,np.inf] )
grouped=df.groupby('range').sum()
grouped.to_excel(writer, 'Export')
writer.close()

I get:

我得到:

raise TypeError("Unsupported type %s in write()" % type(token))
TypeError: Unsupported type <class 'pandas._libs.interval.Interval'> in write()
which it took me a while to decypher.

If instead I do assign labels:

如果我确实分配了标签:

df['range'] = pd.cut( df['a'],[-np.inf,3,8,np.inf], labels =['<3','3-8','>8'] )

then it all runs fine. Any suggestions on how to handle this without assigning custom labels? In the initial phase of my work I tend not to assign labels, because I still don't know how many bins I want - it's a trial and error approach, and assigning labels at each attempt would be time-consuming.

然后一切正常。关于如何在不分配自定义标签的情况下处理此问题的任何建议?在我工作的初始阶段,我倾向于不分配标签,因为我仍然不知道我想要多少个 bin - 这是一种反复试验的方法,每次尝试分配标签都非常耗时。

I am not sure if this can count as a bug, but at the very least it seems like a poorly documented annoyance!

我不确定这是否可以算作一个错误,但至少它似乎是一个记录不足的烦恼!

回答by Scott Boston

Use astype(str):

使用astype(str)

writer = pd.ExcelWriter('test.xlsx')
wk = writer.book.add_worksheet('Test')

df= df= pd.DataFrame(np.random.randint(1,10,(10000,5)), columns=['a','b','c','d','e'])
df['range'] = pd.cut( df['a'],[-np.inf,3,8,np.inf] ).astype(str)
grouped=df.groupby('range').sum()
grouped.to_excel(writer, 'Export')
writer.close()

Output in excel:

excel输出:

range   a   b   c   d   e
(-inf, 3.0] 6798    17277   16979   17266   16949
(3.0, 8.0]  33150   28051   27551   27692   27719
(8.0, inf]  9513    5153    5318    5106    5412

enter image description here

在此处输入图片说明