pandas 根据同一行的其他列中的值将函数应用于数据框列元素?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41962022/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Apply function to dataframe column element based on value in other column for same row?
提问by Chuck
I have a dataframe:
我有一个数据框:
df = pd.DataFrame(
{'number': ['10', '20' , '30', '40'], 'condition': ['A', 'B', 'A', 'B']})
df =
number condition
0 10 A
1 20 B
2 30 A
3 40 B
I want to apply a function to each element within the number column, as follows:
我想对数字列中的每个元素应用一个函数,如下所示:
df['number'] = df['number'].apply(lambda x: func(x))
BUT, even though I apply the function to the number column, I want the function to also make reference to the condition
column i.e. in pseudo code:
但是,即使我将该函数应用于 number 列,我也希望该函数也引用该condition
列,即在伪代码中:
func(n):
#if the value in corresponding condition column is equal to some set of values:
# do some stuff to n using the value in condition
# return new value for n
For a single number, and an example function I would write:
对于单个数字和示例函数,我会写:
number = 10
condition = A
def func(num, condition):
if condition == A:
return num*3
if condition == B:
return num*4
func(number,condition) = 15
How can I incorporate the same function to my apply
statement written above? i.e. making reference to the value within the condition column, while acting on the value within the number column?
如何将相同的功能合并到我apply
上面写的语句中?即引用条件列中的值,同时对数字列中的值进行操作?
Note: I have read through the docs on np.where()
, pandas.loc()
and pandas.index()
but I just cannot figure out how to put it into practice.
注:我已经通过对文档阅读np.where()
,pandas.loc()
并且pandas.index()
可我就是不知道怎样把它付诸实践。
I am struggling with the syntax for referencing the other column from within the function, as I need access to both the values in the number
and condition
column.
我正在努力使用从函数中引用另一列的语法,因为我需要访问number
和condition
列中的值。
As such, my expected output is:
因此,我的预期输出是:
df =
number condition
0 30 A
1 80 B
2 90 A
3 160 B
UPDATE: The above was far too vague. Please see the following:
更新:以上内容太含糊了。请参阅以下内容:
df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})
Entries Conflict
0 "man" "Yes"
1 "guy" "Yes"
2 "boy" "Yes"
3 "girl" "No
def funcA(d):
d = d + 'aaa'
return d
def funcB(d):
d = d + 'bbb'
return d
df1['Entries'] = np.where(df1['Conflict'] == 'Yes', funcA, funcB)
Output:
{'Conflict': ['Yes', 'Yes', 'Yes', 'Np'],
'Entries': array(<function funcB at 0x7f4acbc5a500>, dtype=object)}
How can I apply the above np.where statement to take a pandas series as mentioned in the comments, and produce the desired output shown below:
我如何应用上面的 np.where 语句来获取评论中提到的Pandas系列,并产生如下所示的所需输出:
Desired Output:
期望输出:
Entries Conflict
0 "manaaa" "Yes"
1 "guyaaa" "Yes"
2 "boyaaa" "Yes"
3 "girlbbb" "No
采纳答案by blacksite
I don't know about using pandas.DataFrame.apply
, but you could define a certain condition:multiplier
key-value mapping (seen in multiplier
below), and pass that into your function. Then you can use a list comprehension to calculate the new number
output based on those conditions:
我不知道如何使用pandas.DataFrame.apply
,但您可以定义某个condition:multiplier
键值映射(multiplier
如下所示),并将其传递到您的函数中。然后您可以使用列表推导number
根据这些条件计算新输出:
import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})
multiplier = {'A': 2, 'B': 4}
def func(num, condition, multiplier):
return num * multiplier[condition]
df['new_number'] = [func(df.loc[idx, 'number'], df.loc[idx, 'condition'],
multiplier) for idx in range(len(df))]
Here's the result:
结果如下:
df
Out[24]:
condition number new_number
0 A 10 30
1 B 20 80
2 A 30 90
3 B 40 160
There is likely a vectorized, pure-pandas solution that's more "ideal." But this works, too, in a pinch.
可能有一种更“理想”的矢量化纯Pandas解决方案。但这也适用于紧要关头。
回答by Rene B.
As the question was in regard to the applyfunction to a dataframe column for the same row, it seems more accurate to use the pandas apply
funtion in combination with lambda
:
由于问题是关于将函数应用于同一行的数据框列,因此apply
结合使用Pandas功能似乎更准确lambda
:
import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})
def func(number,condition):
multiplier = {'A': 2, 'B': 4}
return number * multiplier[condition]
df['new_number'] = df.apply(lambda x: func(x['number'], x['condition']), axis=1)
In this example, lambda
takes the columns 'number'and 'condition'of the dataframe df and applies these columns of the same row to the function funcwith apply
.
在此示例中,lambda
采用数据框 df的列'number'和'condition'并将同一行的这些列应用到函数funcwith apply
。
This returns the following result:
这将返回以下结果:
df
Out[10]:
condition number new_number
0 A 10 20
1 B 20 80
2 A 30 60
3 B 40 160
For the UPDATE caseits also possible to use the pandas apply
function:
对于UPDATE 情况,也可以使用 pandasapply
函数:
df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})
def funcA(d):
d = d + 'aaa'
return d
def funcB(d):
d = d + 'bbb'
return d
df1['Entries'] = df1.apply(lambda x: funcA(x['Entries']) if x['Conflict'] == 'Yes' else funcB(x['Entries']), axis=1)
In this example, lambda
takes the columns 'Entries'and 'Conflict'of the dataframe df and applies these columns either to funcAor funcBof the same row with apply
. The condition if funcAor funcBwill be applied is done with an if-else
clause in lambda.
在此示例中,lambda
采用数据框df 的“条目”和“冲突”列,并将这些列应用到与 相同行的funcA或funcBapply
。将应用funcA或funcB的条件是通过if-else
lambda 中的子句完成的。
This returns the following result:
这将返回以下结果:
df
Out[12]:
Conflict Entries
0 Yes manaaa
1 Yes guyaaa
2 Yes boyaaa
3 No girlbbb