将 pandas qcut bin 应用于新数据

Question

提问by GRN

I am using pandas qcut to split some data into 20 bins as part of data prep for training of a binary classification model like so:

我正在使用 pandas qcut 将一些数据分成 20 个 bin 作为数据准备的一部分，用于训练二元分类模型，如下所示：

data['VAR_BIN'] = pd.qcut(cc_data[var], 20, labels=False)

My question is, how can I apply the same binning logic derived from the qcut statement above to a new set of data, say for model validation purposes. Is there an easy way to do this?

我的问题是，我如何将从上面的 qcut 语句派生的相同分箱逻辑应用于一组新数据，例如用于模型验证目的。是否有捷径可寻？

Thanks

谢谢

Answer 1

回答by ayhan

You can do it by passing retbins=True.

你可以通过传递来做到这一点retbins=True。

Consider the following DataFrame:

考虑以下数据帧：

import pandas as pd
import numpy as np
prng = np.random.RandomState(0)
df = pd.DataFrame(prng.randn(100, 2), columns = ["A", "B"])

pd.qcut(df["A"], 20, retbins=True, labels=False)returns a tuple whose second element is the bins. So you can do:

pd.qcut(df["A"], 20, retbins=True, labels=False)返回一个元组，其第二个元素是 bin。所以你可以这样做：

ser, bins = pd.qcut(df["A"], 20, retbins=True, labels=False)

seris the categorical series and binsare the break points. Now you can pass bins to pd.cutto apply the same grouping to the other column:

ser是分类序列，bins是断点。现在，您可以将 bin 传递给以pd.cut将相同的分组应用于另一列：

pd.cut(df["B"], bins=bins, labels=False, include_lowest=True)
Out[38]: 
0     13
1     19
2      3
3      9
4     13
5     17
...

将 pandas qcut bin 应用于新数据

提问by GRN

回答by ayhan

相关推荐

最近更新

标签

将 pandas qcut bin 应用于新数据

提问by GRN

回答by ayhan

相关推荐

使用 pandas 和 matplotlib_venn 绘制维恩图

Python/Pandas 从 csv 创建 zip 文件

在 Pandas 中加入两个大型数据集的最佳方式

pandas df.loc[z,x]=y 如何提高速度？

相关推荐

最近更新

标签