pandas 用无限上/下限切割的熊猫
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30127427/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas cut with infinite upper/lower bounds
提问by sparc_spread
The pandas cut()documentationstates that: "Out of bounds values will be NA in the resulting Categorical object." This makes it difficult when the upper bound is not necessarily clear or important. For example:
的Pandascut()文档指出:“出界值将是NA在所得范畴对象”。当上限不一定清楚或重要时,这会变得困难。例如:
cut (weight, bins=[10,50,100,200])
Will produce the bins:
将产生垃圾箱:
[(10, 50] < (50, 100] < (100, 200]]
So cut (250, bins=[10,50,100,200])will produce a NaN, as will cut (5, bins=[10,50,100,200]). What I'm trying to do is produce something like > 200for the first example and < 10for the second.
所以cut (250, bins=[10,50,100,200])会产生一个NaN,也会产生cut (5, bins=[10,50,100,200])。我想要做的是> 200为第一个示例和< 10第二个示例生成类似的内容。
I realize I could do cut (weight, bins=[float("inf"),10,50,100,200,float("inf")])or the equivalent, but the report style I am following doesn't allow things like (200, inf]. I realize too I could actually specify custom labels via the labelsparameter on cut(), but that means remembering to adjust them every time I adjust bins, which could be often.
我意识到我可以做cut (weight, bins=[float("inf"),10,50,100,200,float("inf")])或等效,但我遵循的报告样式不允许像(200, inf]. 我也意识到我实际上可以通过labels参数 on指定自定义标签cut(),但这意味着记住每次调整时都要调整它们bins,这可能经常发生。
Have I exhausted all the possibilities, or is there something in cut()or elsewhere in pandasthat would help me do this? I'm thinking about writing a wrapper function for cut()that would automatically generate the labels in desired format from the bins, but I wanted to check here first.
我是否已经用尽了所有的可能性,或者里面cut()或其他地方有什么东西pandas可以帮助我做到这一点?我正在考虑编写一个包装函数,cut()它会自动从垃圾箱中生成所需格式的标签,但我想先在这里检查一下。
采纳答案by sparc_spread
After waiting a few days, still no answers posted - I think that's probably because there really is no way around this other than writing the cut()wrapper function. I am posting my version of it here and marking the question as answered. I will change that if new answers come along.
等了几天后,仍然没有答案发布 - 我认为这可能是因为除了编写cut()包装器函数之外,真的没有其他方法可以解决这个问题。我在这里发布我的版本并将问题标记为已回答。如果出现新的答案,我会改变它。
def my_cut (x, bins,
lower_infinite=True, upper_infinite=True,
**kwargs):
r"""Wrapper around pandas cut() to create infinite lower/upper bounds with proper labeling.
Takes all the same arguments as pandas cut(), plus two more.
Args :
lower_infinite (bool, optional) : set whether the lower bound is infinite
Default is True. If true, and your first bin element is something like 20, the
first bin label will be '<= 20' (depending on other cut() parameters)
upper_infinite (bool, optional) : set whether the upper bound is infinite
Default is True. If true, and your last bin element is something like 20, the
first bin label will be '> 20' (depending on other cut() parameters)
**kwargs : any standard pandas cut() labeled parameters
Returns :
out : same as pandas cut() return value
bins : same as pandas cut() return value
"""
# Quick passthru if no infinite bounds
if not lower_infinite and not upper_infinite:
return pd.cut(x, bins, **kwargs)
# Setup
num_labels = len(bins) - 1
include_lowest = kwargs.get("include_lowest", False)
right = kwargs.get("right", True)
# Prepend/Append infinities where indiciated
bins_final = bins.copy()
if upper_infinite:
bins_final.insert(len(bins),float("inf"))
num_labels += 1
if lower_infinite:
bins_final.insert(0,float("-inf"))
num_labels += 1
# Decide all boundary symbols based on traditional cut() parameters
symbol_lower = "<=" if include_lowest and right else "<"
left_bracket = "(" if right else "["
right_bracket = "]" if right else ")"
symbol_upper = ">" if right else ">="
# Inner function reused in multiple clauses for labeling
def make_label(i, lb=left_bracket, rb=right_bracket):
return "{0}{1}, {2}{3}".format(lb, bins_final[i], bins_final[i+1], rb)
# Create custom labels
labels=[]
for i in range(0,num_labels):
new_label = None
if i == 0:
if lower_infinite:
new_label = "{0} {1}".format(symbol_lower, bins_final[i+1])
elif include_lowest:
new_label = make_label(i, lb="[")
else:
new_label = make_label(i)
elif upper_infinite and i == (num_labels - 1):
new_label = "{0} {1}".format(symbol_upper, bins_final[i])
else:
new_label = make_label(i)
labels.append(new_label)
# Pass thru to pandas cut()
return pd.cut(x, bins_final, labels=labels, **kwargs)
回答by Daniel MM. Kamani
You can use float("inf")as the upper bound and -float("inf")as the lower bound in the bins list. It will remove NaN values.
您可以在 bin 列表中float("inf")用作上限和-float("inf")下限。它将删除 NaN 值。

