如何在 Pandas 中生成多个交互项?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33257199/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:05:28  来源:igfitidea点击:

How to generate many interaction terms in Pandas?

pythonpandasscikit-learnstatsmodels

提问by pdevar

I would like to estimate an IVregression model using many interactions with year, demographic, and etc. dummies. I can't find an explicit method to do this in Pandas and am curious if anyone has tips.

我想使用与年份、人口统计等虚拟变量的许多交互来估计IV回归模型。我找不到在 Pandas 中执行此操作的明确方法,并且很好奇是否有人有提示。

I'm thinking of trying scikit-learn and this function:

我正在考虑尝试 scikit-learn 和这个功能:

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

采纳答案by Marcus V.

I was now faced with a similar problem, where I needed a flexible way to create specific interactions and looked through StackOverflow. I followed the tip in the comment above of @user333700 and thanks to him found patsy(http://patsy.readthedocs.io/en/latest/overview.html) and after a Google search this scikit-learn integration patsylearn(https://github.com/amueller/patsylearn).

我现在面临着类似的问题,我需要一种灵活的方式来创建特定的交互并查看 StackOverflow。我跟着注释顶端的上方@ user333700,并感谢他找到替罪羊http://patsy.readthedocs.io/en/latest/overview.html)和谷歌搜索后,这scikit学习整合patsylearnHTTPS: //github.com/amueller/patsylearn)。

So going through the example of @motam79, this is possible:

所以通过@motam79 的例子,这是可能的:

import numpy as np
import pandas as pd
from patsylearn import PatsyModel, PatsyTransformer
x = np.array([[ 3, 20, 11],
   [ 6,  2,  7],
   [18,  2, 17],
   [11, 12, 19],
   [ 7, 20,  6]])
df = pd.DataFrame(x, columns=["a", "b", "c"])
x_t = PatsyTransformer("a:b + a:c + b:c", return_type="dataframe").fit_transform(df)

This returns the following:

这将返回以下内容:

     a:b    a:c    b:c
0   60.0   33.0  220.0
1   12.0   42.0   14.0
2   36.0  306.0   34.0
3  132.0  209.0  228.0
4  140.0   42.0  120.0

I answered to a similar question here, where I provide another example with categorical variables: How can an interaction design matrix be created from categorical variables?

我在这里回答了一个类似的问题,在那里我提供了另一个分类变量的例子: 如何从分类变量创建交互设计矩阵?

回答by motam79

You can use sklearn's PolynomialFeatures function. Here is an example:

您可以使用 sklearn 的 PolynomialFeatures 函数。下面是一个例子:

Let's assume, this is your design (i.e. feature) matrix:

让我们假设,这是您的设计(即特征)矩阵:

x = array([[ 3, 20, 11],
       [ 6,  2,  7],
       [18,  2, 17],
       [11, 12, 19],
       [ 7, 20,  6]])


x_t = PolynomialFeatures(2, interaction_only=True, include_bias=False).fit_transform(x)

Here is the result:

结果如下:

array([[   3.,   20.,   11.,   60.,   33.,  220.],
       [   6.,    2.,    7.,   12.,   42.,   14.],
       [  18.,    2.,   17.,   36.,  306.,   34.],
       [  11.,   12.,   19.,  132.,  209.,  228.],
       [   7.,   20.,    6.,  140.,   42.,  120.]])

The first 3 features are the original features, and the next three are interactions of the original features.

前 3 个特征是原始特征,接下来的三个特征是原始特征的交互作用。