pandas 根据同一行的其他列中的值将函数应用于数据框列元素？

Question

提问by Chuck

I have a dataframe:

我有一个数据框：

df = pd.DataFrame(
    {'number': ['10', '20' , '30', '40'], 'condition': ['A', 'B', 'A', 'B']})

df = 
    number    condition
0    10         A
1    20         B
2    30         A
3    40         B

I want to apply a function to each element within the number column, as follows:

我想对数字列中的每个元素应用一个函数，如下所示：

 df['number'] = df['number'].apply(lambda x: func(x))

BUT, even though I apply the function to the number column, I want the function to also make reference to the conditioncolumn i.e. in pseudo code:

但是，即使我将该函数应用于 number 列，我也希望该函数也引用该condition列，即在伪代码中：

func(n):
    #if the value in corresponding condition column is equal to some set of values:
        # do some stuff to n using the value in condition
        # return new value for n

For a single number, and an example function I would write:

对于单个数字和示例函数，我会写：

number = 10
condition = A
def func(num, condition):
    if condition == A:
        return num*3
    if condition == B:
        return num*4

func(number,condition) = 15

How can I incorporate the same function to my applystatement written above? i.e. making reference to the value within the condition column, while acting on the value within the number column?

如何将相同的功能合并到我apply上面写的语句中？即引用条件列中的值，同时对数字列中的值进行操作？

Note: I have read through the docs on np.where(), pandas.loc()and pandas.index()but I just cannot figure out how to put it into practice.

注：我已经通过对文档阅读np.where()，pandas.loc()并且pandas.index()可我就是不知道怎样把它付诸实践。

I am struggling with the syntax for referencing the other column from within the function, as I need access to both the values in the numberand conditioncolumn.

我正在努力使用从函数中引用另一列的语法，因为我需要访问number和condition列中的值。

As such, my expected output is:

因此，我的预期输出是：

df = 
    number    condition
0    30         A
1    80         B
2    90         A
3    160         B

UPDATE: The above was far too vague. Please see the following:

更新：以上内容太含糊了。请参阅以下内容：

df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})


    Entries    Conflict
0    "man"    "Yes"
1    "guy"    "Yes"
2    "boy"    "Yes"
3    "girl"   "No

def funcA(d):
    d = d + 'aaa'
    return d
def funcB(d):
    d = d + 'bbb'
    return d

df1['Entries'] = np.where(df1['Conflict'] == 'Yes', funcA, funcB)

Output:
{'Conflict': ['Yes', 'Yes', 'Yes', 'Np'],
 'Entries': array(<function funcB at 0x7f4acbc5a500>, dtype=object)}

How can I apply the above np.where statement to take a pandas series as mentioned in the comments, and produce the desired output shown below:

我如何应用上面的 np.where 语句来获取评论中提到的Pandas系列，并产生如下所示的所需输出：

Desired Output:

期望输出：

    Entries    Conflict
0    "manaaa"    "Yes"
1    "guyaaa"    "Yes"
2    "boyaaa"    "Yes"
3    "girlbbb"   "No

Answer 1

采纳答案by blacksite

I don't know about using pandas.DataFrame.apply, but you could define a certain condition:multiplierkey-value mapping (seen in multiplierbelow), and pass that into your function. Then you can use a list comprehension to calculate the new numberoutput based on those conditions:

我不知道如何使用pandas.DataFrame.apply，但您可以定义某个condition:multiplier键值映射（multiplier如下所示），并将其传递到您的函数中。然后您可以使用列表推导number根据这些条件计算新输出：

import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})

multiplier = {'A': 2, 'B': 4}

def func(num, condition, multiplier):
    return num * multiplier[condition]

df['new_number'] = [func(df.loc[idx, 'number'], df.loc[idx, 'condition'], 
                     multiplier) for idx in range(len(df))]

Here's the result:

结果如下：

df
Out[24]: 
  condition  number  new_number
0         A      10          30
1         B      20          80
2         A      30          90
3         B      40         160

There is likely a vectorized, pure-pandas solution that's more "ideal." But this works, too, in a pinch.

可能有一种更“理想”的矢量化纯Pandas解决方案。但这也适用于紧要关头。

Answer 2

回答by Rene B.

As the question was in regard to the applyfunction to a dataframe column for the same row, it seems more accurate to use the pandas applyfuntion in combination with lambda:

由于问题是关于将函数应用于同一行的数据框列，因此apply结合使用Pandas功能似乎更准确lambda：

import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})

def func(number,condition):
    multiplier = {'A': 2, 'B': 4}
    return number * multiplier[condition]

df['new_number'] = df.apply(lambda x: func(x['number'], x['condition']), axis=1)

In this example, lambdatakes the columns 'number'and 'condition'of the dataframe df and applies these columns of the same row to the function funcwith apply.

在此示例中，lambda采用数据框 df的列'number'和'condition'并将同一行的这些列应用到函数funcwith apply。

This returns the following result:

这将返回以下结果：

df
Out[10]: 
 condition  number  new_number
0   A   10  20
1   B   20  80
2   A   30  60
3   B   40  160

For the UPDATE caseits also possible to use the pandas applyfunction:

对于UPDATE 情况，也可以使用 pandasapply函数：

df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})

def funcA(d):
    d = d + 'aaa'
    return d
def funcB(d):
    d = d + 'bbb'
    return d

df1['Entries'] = df1.apply(lambda x: funcA(x['Entries']) if x['Conflict'] == 'Yes' else funcB(x['Entries']), axis=1)

In this example, lambdatakes the columns 'Entries'and 'Conflict'of the dataframe df and applies these columns either to funcAor funcBof the same row with apply. The condition if funcAor funcBwill be applied is done with an if-elseclause in lambda.

在此示例中，lambda采用数据框df 的“条目”和“冲突”列，并将这些列应用到与相同行的funcA或funcBapply。将应用funcA或funcB的条件是通过if-elselambda 中的子句完成的。

This returns the following result:

这将返回以下结果：

df
Out[12]:


    Conflict    Entries
0   Yes     manaaa
1   Yes     guyaaa
2   Yes     boyaaa
3   No  girlbbb

pandas 根据同一行的其他列中的值将函数应用于数据框列元素？

提问by Chuck

采纳答案by blacksite

回答by Rene B.

相关推荐

最近更新

标签

pandas 根据同一行的其他列中的值将函数应用于数据框列元素？

提问by Chuck

采纳答案by blacksite

回答by Rene B.

相关推荐

如何格式化 Pandas timedelta 对象？

如何使用字典键和值来重命名 Pandas DataFrame 中的列？

Pandas 数据框中的不可哈希类型错误

在 Spark 中使用 Pandas

相关推荐

最近更新

标签