pandas 熊猫将两列与空值组合在一起
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41449555/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas combine two columns with null values
提问by vagabond
I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new column to also have NaN. Here's the example:
我有一个包含两列的 df,我想将两列组合在一起而忽略 NaN 值。问题是有时两列都有 NaN 值,在这种情况下,我希望新列也有 NaN。这是示例:
df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})
df
Out[10]:
foodstuff type
0 apple-martini None
1 apple-pie None
2 None strawberry-tart
3 None dessert
4 None None
I tried to use fillna
and solve this :
我尝试使用fillna
并解决这个问题:
df['foodstuff'].fillna('') + df['type'].fillna('')
and I got :
我得到了:
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4
dtype: object
The row 4 has become a blank value. What I wan't in this situation is a NaN value since both the combining columns are NaNs.
第 4 行已成为空白值。在这种情况下我想要的是 NaN 值,因为两个组合列都是 NaN。
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 None
dtype: object
回答by root
回答by sirfz
you can use the combine
method with a lambda
:
您可以使用该combine
方法lambda
:
df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None)
(a or "")
returns ""
if a is None
then the same logic is applied on the concatenation (where the result would be None
if the concatenation is an empty string).
(a or "")
""
如果 a 是,None
则返回相同的逻辑应用于串联(None
如果串联是空字符串,则结果将是)。
回答by piRSquared
fillna
both columns togethersum(1)
to add themreplace('', np.nan)
fillna
两列一起sum(1)
添加它们replace('', np.nan)
df.fillna('').sum(1).replace('', np.nan)
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 NaN
dtype: object
回答by Vikash Singh
You can always fill the empty string in the new column with None
您始终可以使用 None 填充新列中的空字符串
import numpy as np
df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)
Complete code:
完整代码:
import pandas as pd
import numpy as np
df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})
df['new_col'] = df['foodstuff'].fillna('') + df['type'].fillna('')
df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)
df
output:
输出:
foodstuff type new_col
0 apple-martini None apple-martini
1 apple-pie None apple-pie
2 None strawberry-tart strawberry-tart
3 None dessert dessert
4 None None NaN
回答by Mastan Basha Shaik
You can replace the non zero values with column names like
df1= df.replace(1, pd.Series(df.columns, df.columns))
Replace 0's with empty string and then merge the columns like below
f = f.replace(0, '') f['new'] = f.First+f.Second+f.Three+f.Four
您可以用列名替换非零值,例如
df1= df.replace(1, pd.Series(df.columns, df.columns))
用空字符串替换 0,然后合并列,如下所示
f = f.replace(0, '') f['new'] = f.First+f.Second+f.Three+f.Four
Refer the full code below.
请参阅下面的完整代码。
import pandas as pd
df = pd.DataFrame({'Second':[0,1,0,0],'First':[1,0,0,0],'Three':[0,0,1,0],'Four':[0,0,0,1], 'cl': ['3D', 'Wireless','Accounting','cisco']})
df2=pd.DataFrame({'pi':['Accounting','cisco','3D','Wireless']})
df1= df.replace(1, pd.Series(df.columns, df.columns))
f = pd.merge(df1,df2,how='right',left_on=['cl'],right_on=['pi'])
f = f.replace(0, '')
f['new'] = f.First+f.Second+f.Three+f.Four
df1:
df1:
In [3]: df1
Out[3]:
Second First Three Four cl
0 0 First 0 0 3D
1 Second 0 0 0 Wireless
2 0 0 Three 0 Accounting
3 0 0 0 Four cisco
df2:
df2:
In [4]: df2
Out[4]:
pi
0 Accounting
1 cisco
2 3D
3 Wireless
Final df will be:
最终 df 将是:
In [2]: f
Out[2]:
Second First Three Four cl pi new
0 First 3D 3D First
1 Second Wireless Wireless Second
2 Three Accounting Accounting Three
3 Four cisco cisco Four