Python 如何将自定义函数应用于每行的熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40353519/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to apply custom function to pandas data frame for each row
提问by zorny
I want to apply a custom function and create a derived column called population2050 that is based on two columns already present in my data frame.
我想应用一个自定义函数并创建一个名为population2050 的派生列,该列基于我的数据框中已经存在的两列。
import pandas as pd
import sqlite3
conn = sqlite3.connect('factbook.db')
query = "select * from facts where area_land =0;"
facts = pd.read_sql_query(query,conn)
print(list(facts.columns.values))
def final_pop(initial_pop,growth_rate):
final = initial_pop*math.e**(growth_rate*35)
return(final)
facts['pop2050'] = facts['population','population_growth'].apply(final_pop,axis=1)
When I run the above code, I get an error. Am I not using the 'apply' function correctly?
当我运行上面的代码时,出现错误。我没有正确使用“应用”功能吗?
采纳答案by Boud
Apply will pass you along the entire row with axis=1. Adjust like this assuming your two columns are called initial_pop
and growth_rate
Apply 将沿轴 = 1 的整行传递您。假设你的两列被调用initial_pop
,像这样调整growth_rate
def final_pop(row):
return row.initial_pop*math.e**(row.growth_rate*35)
回答by Karnage
You were almost there:
你快到了:
facts['pop2050'] = facts.apply(lambda row: final_pop(row['population'],row['population_growth']),axis=1)
Using lambda allows you to keep the specific (interesting) parameters listed in your function, rather than bundling them in a 'row'.
使用 lambda 允许您保留函数中列出的特定(有趣)参数,而不是将它们捆绑在“行”中。
回答by Mr. Duhart
You can achieve the same result without the need for DataFrame.apply()
. Pandas series (or dataframe columns) can be used as direct arguments for NumPy functions and even built-in Python operators, which are applied element-wise. In your case, it is as simple as the following:
您无需使用DataFrame.apply()
. Pandas 系列(或数据帧列)可以用作 NumPy 函数甚至内置 Python 运算符的直接参数,这些运算符是按元素应用的。在您的情况下,它很简单,如下所示:
import numpy as np
facts['pop2050'] = facts['population'] * np.exp(35 * facts['population_growth'])
This multiplies each element in the column population_growth
, applies numpy's exp()
function to that new column (35 * population_growth
) and then adds the result with population
.
这将列中的每个元素相乘population_growth
,将 numpy 的exp()
函数应用于该新列 ( 35 * population_growth
),然后将结果与 相加population
。
回答by syed irfan
Your function,
你的功能,
def function(x):
// your operation
return x
call your function as,
将您的功能称为,
df['column']=df['column'].apply(function)