使用 `pandas.cut()`,如何获得整数 bin 并避免获得负的最低界限?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32552027/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
With `pandas.cut()`, how do I get integer bins and avoid getting a negative lowest bound?
提问by joelostblom
My dataframe has zero as the lowest value. I am trying to use the precisionand include_lowestparameters of pandas.cut(), but I can't get the intervals consist of integers rather than floats with one decimal. I can also not get the left most interval to stop at zero.
我的数据帧的最小值为零。我正在尝试使用 的precision和include_lowest参数pandas.cut(),但我无法获得由整数组成的间隔,而不是带一位小数的浮点数。我也不能让最左边的间隔停在零。
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style='white', font_scale=1.3)
df = pd.DataFrame(range(0,389,8)[:-1], columns=['value'])
df['binned_df_pd'] = pd.cut(df.value, bins=7, precision=0, include_lowest=True)
sns.pointplot(x='binned_df_pd', y='value', data=df)
plt.xticks(rotation=30, ha='right')
I have tried setting precisionto -1, 0 and 1, but they all output one decimal floats. The pandas.cut()help does mention that the x-min and x-max values are extended with 0.1 % of the x-range, but I thought maybe include_lowestcould suppress this behaviour somehow. My current workaround involves importing numpy:
我试过设置precision为 -1、0 和 1,但它们都输出一个十进制浮点数。在pandas.cut()帮助未提到的X-min和X-MAX值扩展与X系列的0.1%,但我想,也许include_lowest能在某种程度上抑制这种行为。我目前的解决方法涉及导入 numpy:
import numpy as np
bin_counts, edges = np.histogram(df.value, bins=7)
edges = [int(x) for x in edges]
df['binned_df_np'] = pd.cut(df.value, bins=edges, include_lowest=True)
sns.pointplot(x='binned_df_np', y='value', data=df)
plt.xticks(rotation=30, ha='right')
Is there a way to obtain non-negative integers as the interval boundaries directly with pandas.cut()without using numpy?
有没有办法在pandas.cut()不使用 numpy 的情况下直接获得非负整数作为区间边界?
Edit:I just noticed that specifying right=Falsemakes the lowest interval shift to 0 rather than -0.4. It seems to take precedence over include_lowest, as changing the latter does not have any visible effect in combination with right=False. The following intervals are still specified with one decimal point.
编辑:我刚刚注意到指定right=False使最低间隔变为 0 而不是 -0.4。它似乎优先include_lowest,因为更改后者与right=False. 以下间隔仍用一位小数点指定。
回答by PeterLai
you should specifically set the labelsargument
你应该专门设置labels参数
preparations:
准备工作:
lower, higher = df['value'].min(), df['value'].max()
n_bins = 7
build up the labels:
建立标签:
edges = range(lower, higher, (higher - lower)/n_bins) # the number of edges is 8
lbs = ['(%d, %d]'%(edges[i], edges[i+1]) for i in range(len(edges)-1)]
set labels:
设置标签:
df['binned_df_pd'] = pd.cut(df.value, bins=n_bins, labels=lbs, include_lowest=True)


