你如何将多个变量传递给 Pandas 数据框以将它们与 .map 一起使用来创建一个新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30389077/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:23:16  来源:igfitidea点击:

how do you pass multiple variables to pandas dataframe to use them with .map to create a new column

pythonpandas

提问by yoshiserry

To pass multiple variables to a normal python function you can just write something like:

要将多个变量传递给普通的 Python 函数,您可以编写如下内容:

def a_function(date,string,float):
      do something....
      convert string to int, 
      date = date + (float * int) days
      return date

When using Pandas dataframes I know you can create a new column based on the contents of one like so:

使用 Pandas 数据框时,我知道您可以根据其中的内容创建一个新列,如下所示:

df['new_col']) = df['column_A'].map(a_function)
For example this might return the year from a date column
return date.year

What I'm wondering is in the same way you can pass multiple pieces of data to a single function (as seen in the first example above), can you use multiple columns in the creation of a new pandas dataframe column?

我想知道的是,您可以以相同的方式将多条数据传递给单个函数(如上面的第一个示例所示),您可以在创建新的 Pandas 数据框列时使用多列吗?

For example combining three separate parts of a date Y - M - D into one field.

例如,将日期 Y - M - D 的三个独立部分合并到一个字段中。

df['whole_date']) = df['Year','Month','Day'].map(a_function)

I get a key error with the following test.

我在以下测试中遇到一个关键错误。

def combine(one,two,three):
return one + two + three

df = pd.DataFrame({'a': [1,2,3], 'b': [2,3,4],'c': [4,5,6]})

df['d'] = df['a','b','b'].map(combine)

Is there a way of creating a new column in a pandas dataframe using .MAP or something else which takes as input three columns and returns a single column. For example input would be 1, 2, 3 and output would be 1*2*3

有没有办法使用 .MAP 或其他将三列作为输入并返回单列的东西在 Pandas 数据框中创建新列。例如输入将是 1, 2, 3 而输出将是 1*2*3

Likewise is there also a way of having a function take in one argument, a date and return three new pandas dataframe columns; one for the year, month and day?

同样,还有一种方法可以让函数接受一个参数、一个日期并返回三个新的 Pandas 数据框列;一年,月,日?

回答by BrenBarn

Is there a way of creating a new column in a pandas dataframe using .MAP or something else which takes as input three columns and returns a single column. For example input would be 1, 2, 3 and output would be 1*2*3

有没有办法使用 .MAP 或其他将三列作为输入并返回单列的东西在 Pandas 数据框中创建新列。例如输入将是 1, 2, 3 而输出将是 1*2*3

To do that, you can use applywith axis=1. However, instead of being called with three separate arguments (one for each column) your specified function will then be called with a single argument for each row, and that argument will be a Series containing the data for that row. You can either account for this in your function:

为此,您可以使用applywith axis=1。但是,不是使用三个单独的参数(每列一个)调用您指定的函数,然后将使用每个行的单个参数调用,并且该参数将是包含该行数据的系列。您可以在您的函数中对此进行说明:

def combine(row):
    return row['a'] + row['b'] + row['c']

>>> df.apply(combine, axis=1)
0     7
1    10
2    13

Or you can pass a lambda which unpacks the Series into separate arguments:

或者,您可以传递一个 lambda,它将系列解包为单独的参数:

def combine(one,two,three):
    return one + two + three

>>> df.apply(lambda x: combine(*x), axis=1)
0     7
1    10
2    13

If you want to pass only specific rows, you need to select them by indexing on the DataFrame with a list:

如果只想传递特定行,则需要通过在 DataFrame 上使用列表索引来选择它们:

>>> df[['a', 'b', 'c']].apply(lambda x: combine(*x), axis=1)
0     7
1    10
2    13

Note the double brackets. (This doesn't really have anything to do with apply; indexing with a list is the normal way to access multiple columns from a DataFrame.)

注意双括号。(这实际上与 没有任何关系apply;使用列表索引是从 DataFrame 访问多个列的正常方法。)

However, it's important to note that in many cases you don't need to use apply, because you can just use vectorized operations on the columns themselves. The combinefunction above can simply be called with the DataFrame columns themselves as the arguments:

但是,重要的是要注意,在许多情况下您不需要使用apply,因为您可以只对列本身使用矢量化操作。combine上面的函数可以简单地使用 DataFrame 列本身作为参数来调用:

>>> combine(df.a, df.b, df.c)
0     7
1    10
2    13

This is typically much more efficient when the "combining" operation is vectorizable.

当“组合”操作是可向量化的时,这通常更有效。

Likewise is there also a way of having a function take in one argument, a date and return three new pandas dataframe columns; one for the year, month and day?

同样,还有一种方法可以让函数接受一个参数、一个日期并返回三个新的 Pandas 数据框列;一年,月,日?

As above, there are two basic ways to do this: a general but non-vectorized way using apply, and a faster vectorized way. Suppose you have a DataFrame like this:

如上所述,有两种基本方法可以做到这一点:使用 的通用但非矢量化的方法apply,以及更快的矢量化方法。假设你有一个像这样的 DataFrame:

>>> df = pandas.DataFrame({'date': pandas.date_range('2015/05/01', '2015/05/03')})
>>> df
        date
0 2015-05-01
1 2015-05-02
2 2015-05-03

You can define a function that returns a Series for each value, and then applyit to the column:

您可以定义一个函数,为每个值返回一个系列,然后将apply其返回到列:

def dateComponents(date):
    return pandas.Series([date.year, date.month, date.day], index=["Year", "Month", "Day"])

>>> df.date.apply(dateComponents)
11:    Year  Month  Day
0  2015      5    1
1  2015      5    2
2  2015      5    3

In this situation, this is the only option, since there is no vectorized way to access the individual date components. However, in some cases you can use vectorized operations:

在这种情况下,这是唯一的选择,因为没有访问各个日期组件的矢量化方式。但是,在某些情况下,您可以使用矢量化操作:

>>> df = pandas.DataFrame({'a': ["Hello", "There", "Pal"]})
>>> df
        a
0  Hello
1  There
2    Pal

>>> pandas.DataFrame({'FirstChar': df.a.str[0], 'Length': df.a.str.len()})
   FirstChar  Length
0         H       5
1         T       5
2         P       3

Here again the operation is vectorized by operating directly on the values instead of applying a function elementwise. In this case, we have two vectorized operations (getting first character and getting the string length), and then we wrap the results in another call to DataFrame to create separate columns for each of the two kinds of results.

这里再次通过直接对值进行操作而不是按元素应用函数来对操作进行矢量化。在这种情况下,我们有两个向量化操作(获取第一个字符和获取字符串长度),然后我们将结果包装在另一个对 DataFrame 的调用中,为两种结果中的每一种创建单独的列。

回答by maxymoo

I normally use applyfor this kind of thing; it's basically the DataFrame version of map (the axis parameter lets you decide whether to apply your function to rows or columns):

我通常apply用于这种事情;它基本上是地图的 DataFrame 版本(轴参数让您决定是将函数应用于行还是列):

df.apply(lambda row: row.a*row.b*row.c, axis =1)

or

或者

df.apply(np.prod, axis=1)

0     8
1    30
2    72