Pandas Dataframe：如何通过应用函数更新多列？

Question

提问by John Smith

I have a Dataframe df like this:

我有一个像这样的 Dataframe df：

   A   B   C    D
2  1   O   s    h
4  2   P    
7  3   Q
9  4   R   h    m

I have a function f to calculate C and D based on B for a row:

我有一个函数 f 来计算基于 B 的 C 和 D 为一行：

def f(p): #p is the value of column B for a row. 
     return p+'k', p+'n'

How can I populate the missing values for row 4&7 by applying the function f to the Dataframe?

如何通过将函数 f 应用于数据框来填充第 4 行和第 7 行的缺失值？

The expected outcome is like below:

预期结果如下：

   A   B   C    D
2  1   O   s    h
4  2   P   Pk   Pn
7  3   Q   Qk   Qn
9  4   R   h    m

The function f has to be used as the real function is very complicated. Also, the function only needs to be applied to the rows missing C and D

必须使用函数 f，因为实际函数非常复杂。此外，该函数只需要应用于缺少 C 和 D 的行

Answer 1

回答by Fabio Lamanna

Maybe there is a more elegant way, but I would do in this way:

也许有更优雅的方式，但我会这样做：

df['C'] = df['B'].apply(lambda x: f(x)[0])
df['D'] = df['B'].apply(lambda x: f(x)[1])

Applying the function to the columns and get the first and the second value of the outputs. It returns:

将函数应用于列并获得输出的第一个和第二个值。它返回：

   A  B   C   D
0  1  O  Ok  On
1  2  P  Pk  Pn
2  3  Q  Qk  Qn
3  4  R  Rk  Rn

EDIT:

编辑：

In a more concise way, thanks to this answer:

以更简洁的方式，感谢这个答案：

df[['C','D']] = df['B'].apply(lambda x: pd.Series([f(x)[0],f(x)[1]]))

Answer 2

回答by Zenith

I have a more easy way to do it.

我有一个更简单的方法来做到这一点。

If the table is not so big.

如果桌子不是那么大。

def f(row): #row is the value of row. 
    if row['C']=='':
        row['C']=row['B']+'k'
    if row['D']=='':
        row['D']=row['B']+'n'
    return row
df=df.apply(f,axis=1)

Answer 3

回答by Colonel Beauvel

If you want to use your function as such, here is a one liner:

如果你想使用你的函数为这样的，这里是一个班轮：

df.update(df.B.apply(lambda x: pd.Series(dict(zip(['C','D'],f(x))))), overwrite=False)

In [350]: df
Out[350]:
   A  B   C   D
2  1  O   s   h
4  2  P  Pk  Pn
7  3  Q  Qk  Qn
9  4  R   h   m

You can also do:

你也可以这样做：

df1 = df.copy()

df[['C','D']] = df.apply(lambda x: pd.Series([x['B'] + 'k', x['B'] + 'n']), axis=1)

df1.update(df, overwrite=False)

Answer 4

回答by Nader Hisham

simply by doing the following

只需执行以下操作

df.C.loc[df.C.isnull()] = df.B.loc[df.C.isnull()] + 'k'

df.D.loc[df.D.isnull()] = df.B.loc[df.D.isnull()] + 'n'

check this link indexing-view-versus-copyif you want to know why I've use loc

如果您想知道我为什么使用，请检查此链接indexing-view-versus-copyloc

Pandas Dataframe：如何通过应用函数更新多列？

提问by John Smith

回答by Fabio Lamanna

回答by Zenith

I have a more easy way to do it.

我有一个更简单的方法来做到这一点。

If the table is not so big.

如果桌子不是那么大。

回答by Colonel Beauvel

回答by Nader Hisham

相关推荐

最近更新

标签

Pandas Dataframe：如何通过应用函数更新多列？

提问by John Smith

回答by Fabio Lamanna

回答by Zenith

I have a more easy way to do it.

我有一个更简单的方法来做到这一点。

If the table is not so big.

如果桌子不是那么大。

回答by Colonel Beauvel

回答by Nader Hisham

相关推荐

pandas python pandas用月份名称解析日期时间字符串

pandas 计算熊猫数据框中的数据类型

Python Pandas 从 Groupby 中选择组的随机样本

pandas 对数据框中的所有值求和

相关推荐

最近更新

标签