使用 .map() 在 Pandas DataFrame 中有效地创建附加列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16575868/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:49:38  来源:igfitidea点击:

Efficiently creating additional columns in a pandas DataFrame using .map()

pythonpandasdataframe

提问by Daniel Romero

I am analyzing a data set that is similar in shape to the following example. I have two different types of data (abcdata and xyzdata):

我正在分析一个与以下示例形状相似的数据集。我有两种不同类型的数据(abc数据和xyz数据):

   abc1  abc2  abc3  xyz1  xyz2  xyz3
0     1     2     2     2     1     2
1     2     1     1     2     1     1
2     2     2     1     2     2     2
3     1     2     1     1     1     1
4     1     1     2     1     2     1

I want to create a function that adds a categorizing column for each abccolumn that exists in the dataframe. Using lists of column names and a category mapping dictionary, I was able to get my desired result.

我想创建一个函数,为数据框中存在的每个abc列添加一个分类列。使用列名列表和类别映射字典,我能够得到我想要的结果。

abc_columns = ['abc1', 'abc2', 'abc3']
xyz_columns = ['xyz1', 'xyz2', 'xyz3']
abc_category_columns = ['abc1_category', 'abc2_category', 'abc3_category']
categories = {1: 'Good', 2: 'Bad', 3: 'Ugly'}

for i in range(len(abc_category_columns)):
    df3[abc_category_columns[i]] = df3[abc_columns[i]].map(categories)

print df3

The end result:

最终结果:

   abc1  abc2  abc3  xyz1  xyz2  xyz3 abc1_category abc2_category abc3_category
0     1     2     2     2     1     2          Good           Bad           Bad
1     2     1     1     2     1     1           Bad          Good          Good
2     2     2     1     2     2     2           Bad           Bad          Good
3     1     2     1     1     1     1          Good           Bad          Good
4     1     1     2     1     2     1          Good          Good           Bad

While the forloop at the end works fine, I feel like I should be using Python's lambdafunction, but can't seem to figure it out.

虽然最后的for循环工作正常,但我觉得我应该使用 Python 的lambda函数,但似乎无法弄清楚。

Is there a more efficient way to map in a dynamic number of abc-type columns?

有没有更有效的方法来映射动态数量的abc类型列?

回答by Andy Hayden

You can use applymapwith the dictionary getmethod:

您可以使用applymap字典get方法:

In [11]: df[abc_columns].applymap(categories.get)
Out[11]:
   abc1  abc2  abc3
0  Good   Bad   Bad
1   Bad  Good  Good
2   Bad   Bad  Good
3  Good   Bad  Good
4  Good  Good   Bad

And put this to the specified columns:

并将其放入指定的列:

In [12]: abc_categories = map(lambda x: x + '_category', abc_columns)

In [13]: abc_categories
Out[13]: ['abc1_category', 'abc2_category', 'abc3_category']

In [14]: df[abc_categories] = df[abc_columns].applymap(categories.get)

Note: you can construct abc_columnsrelatively efficiently using a list comprehension:

注意:您可以abc_columns使用列表推导式相对高效地构建:

abc_columns = [col for col in df.columns if str(col).startswith('abc')]