pandas 根据另一列的值在熊猫中创建新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39564372/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:02:06  来源:igfitidea点击:

Create new column in pandas based on value of another column

pythonpandas

提问by Piyush

I have some dataset about genders of various individuals. Say, the dataset looks like this:

我有一些关于不同个体性别的数据集。假设数据集如下所示:

Male
Female
Male and Female
Male
Male
Female
Trans
Unknown
Male and Female

Some identify themselves as Male, some female and some identify themselves as both male and female.

有些人认为自己是男性,有些人认为自己是男性和女性。

Now, what I want to do is create a new column in Pandas which maps

现在,我想要做的是在 Pandas 中创建一个新的列来映射

Males to 1, 
Females to 2,
Others to 3

I wrote some code

我写了一些代码

def gender(x):
    if x.str.contains("Male")
        return 1
    elif x.str.contains("Female")
        return 2
    elif return 3

df["Gender Values"] = df["Gender"].apply(gender)

But I was getting errors that function doesn't contain any attribute contains. I tried removing str:

但是我收到了函数不包含任何属性包含的错误。我尝试删除 str:

x.contains("Male")

and I was getting same error

我遇到了同样的错误

Is there a better way to do this?

有一个更好的方法吗?

回答by jezrael

You can use:

您可以使用:

def gender(x):
    if "Female" in x and "Male" in x:
        return 3
    elif "Male" in x:
        return 1
    elif "Female" in x:
        return 2
    else: return 4

df["Gender Values"] = df["Gender"].apply(gender)

print (df)
            Gender  Gender Values
0             Male              1
1           Female              2
2  Male and Female              3
3             Male              1
4             Male              1
5           Female              2
6            Trans              4
7          Unknown              4
8  Male and Female              3

回答by Batman

Create a mapping function, and use that to map the values.

创建一个映射函数,并使用它来映射值。

def map_identity(identity):
    if gender.lower() == 'male':
        return 1
    elif gender.lower() == 'female':
        return 2
    else: 
        return 3

df["B"] = df["A"].map(map_identity)

回答by Rajarshi Das

If there is no specific requirement to use 1, 2, 3 to Males, Females and Others respectively in that order, you can try LabelEncoder from Scikit-Learn. It will randomly allocate a unique number to each unique category in that column.

如果没有具体要求按顺序分别使用 1、2、3 到男性、女性和其他,您可以尝试使用 Scikit-Learn 的 LabelEncoder。它将随机为该列中的每个唯一类别分配一个唯一编号。

from sklearn import preprocessing
encoder = preprocessing.LabelEncoder()
encoder.fit(df["gender"])

For details, you can check Label Encoderdocumentation.

有关详细信息,您可以查看标签编码器文档。

Hope this helps!

希望这可以帮助!