pandas 熊猫将两列与空值组合在一起

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41449555/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:42:47  来源:igfitidea点击:

pandas combine two columns with null values

pythonpandasdataframenonetype

提问by vagabond

I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new column to also have NaN. Here's the example:

我有一个包含两列的 df,我想将两列组合在一起而忽略 NaN 值。问题是有时两列都有 NaN 值,在这种情况下,我希望新列也有 NaN。这是示例:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df
Out[10]:
foodstuff   type
0   apple-martini   None
1   apple-pie   None
2   None    strawberry-tart
3   None    dessert
4   None    None

I tried to use fillnaand solve this :

我尝试使用fillna并解决这个问题:

df['foodstuff'].fillna('') + df['type'].fillna('')

and I got :

我得到了:

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                   
dtype: object

The row 4 has become a blank value. What I wan't in this situation is a NaN value since both the combining columns are NaNs.

第 4 行已成为空白值。在这种情况下我想要的是 NaN 值,因为两个组合列都是 NaN。

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4            None       
dtype: object

回答by root

Use fillnaon one column with the fill values being the other column:

使用fillna上的填充值是另一列一列:

df['foodstuff'].fillna(df['type'])

The resulting output:

结果输出:

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4               None

回答by sirfz

you can use the combinemethod with a lambda:

您可以使用该combine方法lambda

df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None)

(a or "")returns ""if a is Nonethen the same logic is applied on the concatenation (where the result would be Noneif the concatenation is an empty string).

(a or "")""如果 a 是,None则返回相同的逻辑应用于串联(None如果串联是空字符串,则结果将是)。

回答by piRSquared

  • fillnaboth columns together
  • sum(1)to add them
  • replace('', np.nan)
  • fillna两列一起
  • sum(1)添加它们
  • replace('', np.nan)


df.fillna('').sum(1).replace('', np.nan)

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                NaN
dtype: object

回答by Vikash Singh

You can always fill the empty string in the new column with None

您始终可以使用 None 填充新列中的空字符串

import numpy as np

df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)

Complete code:

完整代码:

import pandas as pd
import numpy as np

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df['new_col'] = df['foodstuff'].fillna('') + df['type'].fillna('')

df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)

df

output:

输出:

    foodstuff   type    new_col
0   apple-martini   None    apple-martini
1   apple-pie   None    apple-pie
2   None    strawberry-tart strawberry-tart
3   None    dessert dessert
4   None    None    NaN

回答by Mastan Basha Shaik

  1. You can replace the non zero values with column names like

    df1= df.replace(1, pd.Series(df.columns, df.columns))

  2. Replace 0's with empty string and then merge the columns like below

    f = f.replace(0, '') f['new'] = f.First+f.Second+f.Three+f.Four

  1. 您可以用列名替换非零值,例如

    df1= df.replace(1, pd.Series(df.columns, df.columns))

  2. 用空字符串替换 0,然后合并列,如下所示

    f = f.replace(0, '') f['new'] = f.First+f.Second+f.Three+f.Four

Refer the full code below.

请参阅下面的完整代码。

import pandas as pd
df = pd.DataFrame({'Second':[0,1,0,0],'First':[1,0,0,0],'Three':[0,0,1,0],'Four':[0,0,0,1], 'cl': ['3D', 'Wireless','Accounting','cisco']})
df2=pd.DataFrame({'pi':['Accounting','cisco','3D','Wireless']})
df1= df.replace(1, pd.Series(df.columns, df.columns))
f = pd.merge(df1,df2,how='right',left_on=['cl'],right_on=['pi'])
f = f.replace(0, '')
f['new'] = f.First+f.Second+f.Three+f.Four

df1:

df1:

In [3]: df1                                                                                                                                                                              
Out[3]: 
   Second  First  Three  Four          cl
0       0  First      0     0          3D
1  Second      0      0     0    Wireless
2       0      0  Three     0  Accounting
3       0      0      0  Four       cisco

df2:

df2:

In [4]: df2                                                                                                                                                                              
Out[4]: 
           pi
0  Accounting
1       cisco
2          3D
3    Wireless

Final df will be:

最终 df 将是:

In [2]: f                                                                                                                                                                                
Out[2]: 
   Second  First  Three  Four          cl          pi     new
0          First                       3D          3D   First
1  Second                        Wireless    Wireless  Second
2                 Three        Accounting  Accounting   Three
3                        Four       cisco       cisco    Four