返回两个新列的 Pandas Apply 函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47969756/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Apply Function That returns two new columns
提问by user2242044
I have a pandas
dataframe that I would like to use an apply function on to generate two new columns based on the existing data. I am getting this error:
ValueError: Wrong number of items passed 2, placement implies 1
我有一个pandas
数据框,我想使用应用函数根据现有数据生成两个新列。我收到此错误:
ValueError: Wrong number of items passed 2, placement implies 1
import pandas as pd
import numpy as np
def myfunc1(row):
C = row['A'] + 10
D = row['A'] + 50
return [C, D]
df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))
df['C', 'D'] = df.apply(myfunc1 ,axis=1)
Starting DF:
开始DF:
A B
0 6 1
1 8 4
Desired DF:
期望的DF:
A B C D
0 6 1 16 56
1 8 4 18 58
回答by oim
Based on your latest error, you can avoid the error by returning the new columns as a Series
根据您的最新错误,您可以通过将新列作为系列返回来避免该错误
def myfunc1(row):
C = row['A'] + 10
D = row['A'] + 50
return pd.Series([C, D])
df[['C', 'D']] = df.apply(myfunc1 ,axis=1)
回答by Bharath
df['C','D']
is considered as 1 column rather than 2. So for 2 columns you need a sliced dataframe so use df[['C','D']]
df['C','D']
被视为 1 列而不是 2。因此对于 2 列,您需要一个切片数据框,因此请使用 df[['C','D']]
df[['C', 'D']] = df.apply(myfunc1 ,axis=1)
A B C D
0 4 6 14 54
1 5 1 15 55
Or you can use chain assignment i.e
或者您可以使用链分配即
df['C'], df['D'] = df.apply(myfunc1 ,axis=1)
回答by gabe_
Add extra brackets when querying for multiple columns.
查询多列时添加额外的括号。
import pandas as pd
import numpy as np
def myfunc1(row):
C = row['A'] + 10
D = row['A'] + 50
return [C, D]
df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))
df[['C', 'D']] = df.apply(myfunc1 ,axis=1)
回答by Federico Dorato
Please be aware of the huge memory consumption and low speed of the accepted answer: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/!
请注意已接受答案的巨大内存消耗和低速:https: //ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/!
Using the suggestion presented there, the correct answer would be like this:
使用那里提出的建议,正确的答案是这样的:
def run_loopy(df):
Cs, Ds = [], []
for _, row in df.iterrows():
c, d, = myfunc1(row['A'])
Cs.append(c)
Ds.append(d)
df_result = pd.DataFrame({'C': v1s,
'D': v2s})
def myfunc1(a):
c = a + 10
d = a + 50
return c,d
df[['C', 'D']] = run_loopy(df)