返回两个新列的 Pandas Apply 函数

Question

提问by user2242044

I have a pandasdataframe that I would like to use an apply function on to generate two new columns based on the existing data. I am getting this error: ValueError: Wrong number of items passed 2, placement implies 1

我有一个pandas数据框，我想使用应用函数根据现有数据生成两个新列。我收到此错误： ValueError: Wrong number of items passed 2, placement implies 1

import pandas as pd
import numpy as np

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return [C, D]

df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))

df['C', 'D'] = df.apply(myfunc1 ,axis=1)

Starting DF:

开始DF：

   A  B
0  6  1
1  8  4

Desired DF:

期望的DF：

   A  B  C   D
0  6  1  16  56
1  8  4  18  58

Answer 1

回答by oim

Based on your latest error, you can avoid the error by returning the new columns as a Series

根据您的最新错误，您可以通过将新列作为系列返回来避免该错误

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return pd.Series([C, D])

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)

Answer 2

回答by Bharath

df['C','D']is considered as 1 column rather than 2. So for 2 columns you need a sliced dataframe so use df[['C','D']]

df['C','D']被视为 1 列而不是 2。因此对于 2 列，您需要一个切片数据框，因此请使用 df[['C','D']]

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)

    A  B   C   D
0  4  6  14  54
1  5  1  15  55

Or you can use chain assignment i.e

或者您可以使用链分配即

df['C'], df['D'] = df.apply(myfunc1 ,axis=1)

Answer 3

回答by gabe_

Add extra brackets when querying for multiple columns.

查询多列时添加额外的括号。

import pandas as pd
import numpy as np

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return [C, D]

df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)

Answer 4

回答by Federico Dorato

Please be aware of the huge memory consumption and low speed of the accepted answer: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/!

请注意已接受答案的巨大内存消耗和低速：https: //ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/！

Using the suggestion presented there, the correct answer would be like this:

使用那里提出的建议，正确的答案是这样的：

def run_loopy(df):
    Cs, Ds = [], []
    for _, row in df.iterrows():
        c, d, = myfunc1(row['A'])
        Cs.append(c)
        Ds.append(d)
    df_result = pd.DataFrame({'C': v1s,
                              'D': v2s})

def myfunc1(a):
    c = a + 10
    d = a + 50
    return c,d

df[['C', 'D']] = run_loopy(df)

返回两个新列的 Pandas Apply 函数

提问by user2242044

回答by oim

回答by Bharath

回答by gabe_

回答by Federico Dorato

相关推荐

最近更新

标签

返回两个新列的 Pandas Apply 函数

提问by user2242044

回答by oim

回答by Bharath

回答by gabe_

回答by Federico Dorato

相关推荐

pandas 将时间舍入到最接近的秒数 - Python

使用映射器时，pandas DataFrame.rename 意外的关键字参数“axis”

pandas 按年和月分组 Panda Pivot Table

pandas 如何在熊猫多索引数据框中仅选择索引列？

相关推荐

最近更新

标签