Python 如何将自定义函数应用于每行的熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40353519/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:26:25  来源:igfitidea点击:

How to apply custom function to pandas data frame for each row

pythonpandas

提问by zorny

I want to apply a custom function and create a derived column called population2050 that is based on two columns already present in my data frame.

我想应用一个自定义函数并创建一个名为population2050 的派生列,该列基于我的数据框中已经存在的两列。

import pandas as pd
import sqlite3
conn = sqlite3.connect('factbook.db')
query = "select * from facts where area_land =0;"
facts = pd.read_sql_query(query,conn)
print(list(facts.columns.values))

def final_pop(initial_pop,growth_rate):
    final = initial_pop*math.e**(growth_rate*35)
    return(final)

facts['pop2050'] = facts['population','population_growth'].apply(final_pop,axis=1)

When I run the above code, I get an error. Am I not using the 'apply' function correctly?

当我运行上面的代码时,出现错误。我没有正确使用“应用”功能吗?

采纳答案by Boud

Apply will pass you along the entire row with axis=1. Adjust like this assuming your two columns are called initial_popand growth_rate

Apply 将沿轴 = 1 的整行传递您。假设你的两列被调用initial_pop,像这样调整growth_rate

def final_pop(row):
    return row.initial_pop*math.e**(row.growth_rate*35)

回答by Karnage

You were almost there:

你快到了:

facts['pop2050'] = facts.apply(lambda row: final_pop(row['population'],row['population_growth']),axis=1)

Using lambda allows you to keep the specific (interesting) parameters listed in your function, rather than bundling them in a 'row'.

使用 lambda 允许您保留函数中列出的特定(有趣)参数,而不是将它们捆绑在“行”中。

回答by Mr. Duhart

You can achieve the same result without the need for DataFrame.apply(). Pandas series (or dataframe columns) can be used as direct arguments for NumPy functions and even built-in Python operators, which are applied element-wise. In your case, it is as simple as the following:

您无需使用DataFrame.apply(). Pandas 系列(或数据帧列)可以用作 NumPy 函数甚至内置 Python 运算符的直接参数,这些运算符是按元素应用的。在您的情况下,它很简单,如下所示:

import numpy as np

facts['pop2050'] = facts['population'] * np.exp(35 * facts['population_growth'])

This multiplies each element in the column population_growth, applies numpy's exp()function to that new column (35 * population_growth) and then adds the result with population.

这将列中的每个元素相乘population_growth,将 numpy 的exp()函数应用于该新列 ( 35 * population_growth),然后将结果与 相加population

回答by syed irfan

Your function,

你的功能,

def function(x):
  // your operation
  return x

call your function as,

将您的功能称为,

df['column']=df['column'].apply(function)