通过字符串变量访问 Pandas DataFrame 的列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28056704/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:51:30  来源:igfitidea点击:

Accessing the columns of a pandas DataFrame via a string variable

pythonpandasstatsmodels

提问by Thomas Philips

I've set up a little function that takes in a pandas DataFrame and a few parameters, and then attempts to create an OLS regression using statsmodels. It's designed to allow me to call it from a loop, running lots of different regressions with some simple code. Unfortunately, it doesn't work, and I'd appreciate some guidance on what i need to do to make it work. Here's the function:

我已经设置了一个小函数,它接受一个 Pandas DataFrame 和一些参数,然后尝试使用 statsmodels 创建一个 OLS 回归。它旨在允许我从循环中调用它,使用一些简单的代码运行许多不同的回归。不幸的是,它不起作用,我很感激我需要做些什么才能使它工作的一些指导。这是函数:

def regressReturns(rawData, predictor, horizon): x = rawData.eval(predictor) x = sm.add_constant(x) y = rawData.eval(str(horizon) + '_Yr_Return') results = sm.OLS(y,x).fit() return results.params

def regressReturns(rawData, predictor, horizon): x = rawData.eval(predictor) x = sm.add_constant(x) y = rawData.eval(str(horizon) + '_Yr_Return') results = sm.OLS(y,x).fit() return results.params

I get nothing other than a syntax error if I call it from a loop:

如果我从循环中调用它,我只会得到一个语法错误:

for rh in retunHorizons: regressReturns(rawData,'Earnings_Yield', rh)

for rh in retunHorizons: regressReturns(rawData,'Earnings_Yield', rh)

What am I doing wrong? Also, I'm a pandas newbie, so an example along with an explanation would be greatly appreciated.

我究竟做错了什么?另外,我是Pandas新手,因此将不胜感激提供示例和解释。

Thanks in advance for your assistance.

提前感谢你的帮助。

Thomas Philips

托马斯·菲利普斯

回答by mikedal

I'm assuming that rawData is your DataFrame, and that what you have in your evals is the name of the column you are trying to access. If that's the case, the following will work:

我假设 rawData 是您的 DataFrame,并且您在 evals 中拥有的是您尝试访问的列的名称。如果是这种情况,以下将起作用:

x = rawData[predictor]
y = rawData[str(horizon) + '_Yr_Return']

Columns can be accessed with both as attributes, and like a dict. The first way is a bit more concise, but the second way is more flexible if you want to use variables as a column name.

列既可以作为属性访问,也可以像 dict 一样访问。第一种方式更简洁一些,但是如果您想使用变量作为列名,则第二种方式更灵活。

回答by elyase

You can do OLS directly with pandas:

你可以直接用 Pandas 做 OLS:

from pandas.stats.api import ols

def regressReturns(rawData, predictor, horizon):
    rawData.dropna(inplace=True)
    results = ols(y=rawData[str(horizon) + '_Yr_Return'] ,
                  x=rawData[predictor])
    return res.sm_ols.params