pandas 熊猫-向数据帧添加一个系列会导致出现 NaN 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24188729/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas- adding a series to a dataframe causes NaN values to appear
提问by DataSwede
I have a dataframe that looks something like this:
我有一个看起来像这样的数据框:
d = {'Col_1' : pd.Series(['A', 'A', 'A', 'B']),
'Col_2' : pd.Series(['B', 'C', 'B', 'D']),
'Col_3' : pd.Series([np.nan, 'D', 'C', np.nan]),
'Col_4' : pd.Series([np.nan, np.nan, 'D', np.nan]),
'Col_5' : pd.Series([np.nan, np.nan, 'E', np.nan]),}
df = pd.DataFrame(d)
Col_1 Col_2 Col_3 Col_4 Col_5
A B NaN NaN NaN
A C D NaN NaN
A B C D E
B D NaN NaN NaN
My Goal is to end up with something along the lines of:
我的目标是最终得到以下内容:
Col_1 Col_2 Col_3 Col_4 Col_5 ConCat
A B NaN NaN NaN A:B
A C D NaN NaN A:C:D
A B C D E A:B:C:D:E
B D NaN NaN NaN B:D
I've successfully created a dataframe that looks like the desired output from:
我已经成功创建了一个数据框,它看起来像来自:
rows = df.values
df_1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
0
0 A:B
1 A:C:D
2 A:B:C:D:E
3 B:D
But now when I attempt to place it into the original dataframe, I get:
但是现在当我尝试将它放入原始数据框中时,我得到:
df['concatenated'] = df_1
Col_1 Col_2 Col_3 Col_4 Col_5 concatenated
A B NaN NaN NaN NaN
A C D NaN NaN NaN
A B C D E NaN
B D NaN NaN NaN NaN
What's strange is that when creating a simplified example, it works as expected. Below if the full code of what I'm doing. The original data comes to me transposed from what the original dataframe above looks like.
奇怪的是,在创建简化示例时,它按预期工作。下面是我正在做的完整代码。原始数据是从上面的原始数据框的样子转过来的。
df_caregiver_type = pd.concat([df_caregiver_type[col].order().reset_index(drop=True) for col in df_caregiver_type], axis=1, ignore_index=False).T
df_caregiver_type.rename(columns=lambda x: 'Col_' + str(x), inplace=True)
rows = df_caregiver_type.values
df_caregiver_type1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
df_caregiver_type['concatenated'] = df_caregiver_type1
df_caregiver_type = df_caregiver_type.T
df_caregiver_type
UpdateI'm thinking I'm getting an error due to the first row of the full code. It's from a separate, but related question: pandas: sort each column individually
更新我想由于完整代码的第一行而出现错误。它来自一个单独但相关的问题:pandas:单独对每一列进行排序
回答by CT Zhu
For your full dataset, change the last step from df['concatenated'] = df_1to df['concatenated'] = df_1.valueswill solve the issue, I think it a bug and I am very sure I have seen it in SO before.
对于您的完整数据集,将最后一步从 更改df['concatenated'] = df_1为df['concatenated'] = df_1.values将解决问题,我认为这是一个错误,我很确定我之前在 SO 中见过它。
Or just: df['concatenated'] = [':'.join(word for word in row if word is not np.nan) for row in rows]
要不就: df['concatenated'] = [':'.join(word for word in row if word is not np.nan) for row in rows]
回答by Vor
>>> d = {'Col_1' : pd.Series(['A', 'A', 'A', 'B']),
... 'Col_2' : pd.Series(['B', 'C', 'B', 'D']),
... 'Col_3' : pd.Series([np.nan, 'D', 'C', np.nan]),
... 'Col_4' : pd.Series([np.nan, np.nan, 'D', np.nan]),
... 'Col_5' : pd.Series([np.nan, np.nan, 'E', np.nan]),}
>>> df = pd.DataFrame(d)
>>>
>>> rows = df.values
>>> df_1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
>>>
>>> df['concatenated'] = df_1[0]
>>> df
Col_1 Col_2 Col_3 Col_4 Col_5 concatenated
0 A B NaN NaN NaN A:B
1 A C D NaN NaN A:C:D
2 A B C D E A:B:C:D:E
3 B D NaN NaN NaN B:D
>>>
回答by gobrewers14
>>> df = df.join(df_1)
>>> df = df.rename(columns = {0:'concatenated'})

