Python 在 Pandas 中添加计算列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45393123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Adding calculated column in Pandas
提问by JD2775
I have a dataframe with 10 columns. I want to add a new column 'age_bmi' which should be a calculated column multiplying 'age' * 'bmi'. age is an INT, bmi is a FLOAT.
我有一个包含 10 列的数据框。我想添加一个新列 'age_bmi',它应该是一个乘以 'age' * 'bmi' 的计算列。age 是一个 INT,bmi 是一个 FLOAT。
That then creates the new dataframe with 11 columns.
然后创建具有 11 列的新数据框。
Something I am doing isn't quite right. I think it's a syntax issue. Any ideas?
我正在做的事情并不完全正确。我认为这是一个语法问题。有任何想法吗?
Thanks
谢谢
df2['age_bmi'] = df(['age'] * ['bmi'])
print(df2)
回答by Cory Madden
try df2['age_bmi'] = df.age * df.bmi
.
试试df2['age_bmi'] = df.age * df.bmi
。
You're trying to call the dataframe as a function, when you need to get the values of the columns, which you can access by key like a dictionary or by property if it's a lowercase name with no spaces that doesn't match a built-in DataFrame method.
您正在尝试将数据框作为函数调用,当您需要获取列的值时,您可以通过键(如字典)或属性(如果它是一个没有空格的小写名称与内置的不匹配)访问这些值-in DataFrame 方法。
Someone linked this in a comment the other day and it's pretty awesome. I recommend giving it a watch, even if you don't do the exercises: https://www.youtube.com/watch?v=5JnMutdy6Fw
前几天有人在评论中链接了这个,这非常棒。即使你不做练习,我也建议给它一块手表:https: //www.youtube.com/watch?v=5JnMutdy6Fw
回答by Zero
As pointed by Cory, you're calling a dataframe as a function, that'll not work as you expect. Here are 4 ways to multiple two columns, in most cases you'd use the first method.
正如 Cory 所指出的,您将数据帧作为函数调用,这不会像您预期的那样工作。这里有 4 种方法可以多列两列,在大多数情况下,您会使用第一种方法。
In [299]: df['age_bmi'] = df.age * df.bmi
or,
或者,
In [300]: df['age_bmi'] = df.eval('age*bmi')
or,
或者,
In [301]: df['age_bmi'] = pd.eval('df.age*df.bmi')
or,
或者,
In [302]: df['age_bmi'] = df.age.mul(df.bmi)