Pandas DataFrame.assign 参数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42101382/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:55:19  来源:igfitidea点击:

Pandas DataFrame.assign arguments

pandas

提问by Alexander

QUESTION

How can assignbe used to return a copy of the original DataFrame with multiple new columns added?

如何assign用于返回添加了多个新列的原始 DataFrame 的副本?

DESIRED RESULT

想要的结果

df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)})
>>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28

ATTEMPTS

尝试

The example above results in:

上面的例子导致:

ValueError: Wrong number of items passed 2, placement implies 1.

ValueError: Wrong number of items passed 2, placement implies 1.

BACKGROUND

背景

The assignfunction in Pandas takes a copy of the relevant dataframe joined to the newly assigned column, e.g.

assignPandas 中的函数获取连接到新分配列的相关数据框的副本,例如

df = df.assign(C=df.B * 2)
>>> df
   A   B   C
0  1  11  22
1  2  12  24
2  3  13  26
3  4  14  28

The 0.19.2 documentationfor this function implies that more than one column can be added to the dataframe.

此函数的0.19.2 文档暗示可以将不止一列添加到数据框中。

Assigning multiple columns within the same assign is possible, but you cannot reference other columns created within the same assign call.

可以在同一个分配中分配多个列,但不能引用在同一个分配调用中创建的其他列。

In addition:

此外:

Parameters:
kwargs : keyword, value pairs

keywords are the column names.

参数:
kwargs :关键字,值对

关键字是列名。

The source code for the function states that it accepts a dictionary:

该函数的源代码声明它接受一个字典:

def assign(self, **kwargs):
    """
    .. versionadded:: 0.16.0
    Parameters
    ----------
    kwargs : keyword, value pairs
        keywords are the column names. If the values are callable, they are computed 
        on the DataFrame and assigned to the new columns. If the values are not callable, 
        (e.g. a Series, scalar, or array), they are simply assigned.

    Notes
    -----
    Since ``kwargs`` is a dictionary, the order of your
    arguments may not be preserved. The make things predicatable,
    the columns are inserted in alphabetical order, at the end of
    your DataFrame. Assigning multiple columns within the same
    ``assign`` is possible, but you cannot reference other columns
    created within the same ``assign`` call.
    """

    data = self.copy()

    # do all calculations first...
    results = {}
    for k, v in kwargs.items():

        if callable(v):
            results[k] = v(data)
        else:
            results[k] = v

    # ... and then assign
    for k, v in sorted(results.items()):
        data[k] = v

    return data

回答by root

You can create multiple column by supplying each new column as a keyword argument:

您可以通过提供每个新列作为关键字参数来创建多列:

df = df.assign(C=df['A']**2, D=df.B*2)

I got your example dictionary to work by unpacking the dictionary as keyword arguments using **:

我通过使用**以下命令将字典解压缩为关键字参数来使您的示例字典工作:

df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})

It seems like assignshould be able to take a dictionary, but it doesn't look to be currently supported based on the source code you posted.

看起来assign应该可以使用字典,但根据您发布的源代码,它目前似乎不受支持。

The resulting output:

结果输出:

   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28