pandas 如何在熊猫中定义用户定义的函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35414431/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:42:16  来源:igfitidea点击:

How to define user defined function in pandas

pythonpandas

提问by Edwin Baby

I have a csv file that contains information like

我有一个包含以下信息的 csv 文件

name    salary  department
a        2500      x
b        5000      y
c        10000      y
d        20000      x 

I need to convert this using Pandas to the form like

我需要使用 Pandas 将其转换为类似的形式

dept    name    position
x        a       Normal Employee
x        b       Normal Employee
y        c       Experienced Employee
y        d       Experienced Employee

if the salary <=8000 Position is Normal Employee

如果薪水 <=8000 职位是普通员工

if the salary >8000 && <=25000 Position is Experienced Employee

如果薪水 >8000 && <=25000 职位是有经验的员工

My default code for group by

我的默认分组代码

import csv
import pandas
pandas.set_option('display.max_rows', 999)
data_df = pandas.read_csv('employeedetails.csv')
#print(data_df.columns)
t = data_df.groupby(['dept'])
print t

What are the changes i need to make in this code to get the output that i mentioned above

我需要在此代码中进行哪些更改才能获得我上面提到的输出

采纳答案by EdChum

You could define 2 masks and pass these to np.where:

您可以定义 2 个掩码并将它们传递给np.where

In [91]:
normal = df['salary'] <= 8000
experienced = (df['salary'] > 8000) & (df['salary'] <= 25000)
df['position'] = np.where(normal, 'normal emplyee', np.where(experienced, 'experienced employee', 'unknown'))
df

Out[91]:
  name  salary department              position
0    a    2500          x        normal emplyee
1    b    5000          y        normal emplyee
2    c   10000          y  experienced employee
3    d   20000          x  experienced employee

Or slightly more readable is to pass them to loc:

或者稍微更具可读性的是将它们传递给loc

In [92]:
df.loc[normal, 'position'] = 'normal employee'
df.loc[experienced,'position'] = 'experienced employee'
df

Out[92]:
  name  salary department              position
0    a    2500          x       normal employee
1    b    5000          y       normal employee
2    c   10000          y  experienced employee
3    d   20000          x  experienced employee

回答by Fabio Lamanna

I would use a simple function like:

我会使用一个简单的函数,如:

def f(x):
    if x <= 8000:
        x = 'Normal Employee'
    elif 8000 < x <= 25000:
        x = 'Experienced Employee'
    return x

and then apply it to the df:

然后将其应用于df

df['position'] = df['salary'].apply(f)

回答by IanS

A useful function is apply:

一个有用的功能是apply

data_df['position'] = data_df['salary'].apply(lambda salary: 'Normal Employee' if salary <= 8000 else 'Experienced Employee', axis=1)

This applies the lambdafunction to every element in the salary column.

这将lambda函数应用于工资列中的每个元素。