将 pandas qcut bin 应用于新数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37906210/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Applying pandas qcut bins to new data
提问by GRN
I am using pandas qcut to split some data into 20 bins as part of data prep for training of a binary classification model like so:
我正在使用 pandas qcut 将一些数据分成 20 个 bin 作为数据准备的一部分,用于训练二元分类模型,如下所示:
data['VAR_BIN'] = pd.qcut(cc_data[var], 20, labels=False)
My question is, how can I apply the same binning logic derived from the qcut statement above to a new set of data, say for model validation purposes. Is there an easy way to do this?
我的问题是,我如何将从上面的 qcut 语句派生的相同分箱逻辑应用于一组新数据,例如用于模型验证目的。是否有捷径可寻?
Thanks
谢谢
回答by ayhan
You can do it by passing retbins=True
.
你可以通过传递来做到这一点retbins=True
。
Consider the following DataFrame:
考虑以下数据帧:
import pandas as pd
import numpy as np
prng = np.random.RandomState(0)
df = pd.DataFrame(prng.randn(100, 2), columns = ["A", "B"])
pd.qcut(df["A"], 20, retbins=True, labels=False)
returns a tuple whose second element is the bins. So you can do:
pd.qcut(df["A"], 20, retbins=True, labels=False)
返回一个元组,其第二个元素是 bin。所以你可以这样做:
ser, bins = pd.qcut(df["A"], 20, retbins=True, labels=False)
ser
is the categorical series and bins
are the break points. Now you can pass bins to pd.cut
to apply the same grouping to the other column:
ser
是分类序列,bins
是断点。现在,您可以将 bin 传递给以pd.cut
将相同的分组应用于另一列:
pd.cut(df["B"], bins=bins, labels=False, include_lowest=True)
Out[38]:
0 13
1 19
2 3
3 9
4 13
5 17
...