pandas 熊猫-向数据帧添加一个系列会导致出现 NaN 值

Question

提问by DataSwede

I have a dataframe that looks something like this:

我有一个看起来像这样的数据框：

d = {'Col_1' : pd.Series(['A', 'A', 'A', 'B']),
     'Col_2' : pd.Series(['B', 'C', 'B', 'D']),
     'Col_3' : pd.Series([np.nan, 'D', 'C', np.nan]),
     'Col_4' : pd.Series([np.nan, np.nan, 'D', np.nan]),
     'Col_5' : pd.Series([np.nan, np.nan, 'E', np.nan]),}
df = pd.DataFrame(d)

Col_1  Col_2  Col_3  Col_4  Col_5
  A      B      NaN    NaN    NaN
  A      C      D      NaN    NaN
  A      B      C      D      E
  B      D      NaN    NaN    NaN

My Goal is to end up with something along the lines of:

我的目标是最终得到以下内容：

Col_1  Col_2  Col_3  Col_4  Col_5  ConCat
  A      B      NaN    NaN    NaN    A:B
  A      C      D      NaN    NaN    A:C:D
  A      B      C      D      E      A:B:C:D:E
  B      D      NaN    NaN    NaN    B:D

I've successfully created a dataframe that looks like the desired output from:

我已经成功创建了一个数据框，它看起来像来自：

rows = df.values
df_1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])

    0
0  A:B
1  A:C:D
2  A:B:C:D:E
3  B:D

But now when I attempt to place it into the original dataframe, I get:

但是现在当我尝试将它放入原始数据框中时，我得到：

df['concatenated'] = df_1

Col_1  Col_2  Col_3  Col_4  Col_5  concatenated
  A      B      NaN    NaN    NaN    NaN
  A      C      D      NaN    NaN    NaN
  A      B      C      D      E      NaN
  B      D      NaN    NaN    NaN    NaN

What's strange is that when creating a simplified example, it works as expected. Below if the full code of what I'm doing. The original data comes to me transposed from what the original dataframe above looks like.

奇怪的是，在创建简化示例时，它按预期工作。下面是我正在做的完整代码。原始数据是从上面的原始数据框的样子转过来的。

df_caregiver_type = pd.concat([df_caregiver_type[col].order().reset_index(drop=True) for col in df_caregiver_type], axis=1, ignore_index=False).T
df_caregiver_type.rename(columns=lambda x: 'Col_' + str(x), inplace=True)
rows = df_caregiver_type.values
df_caregiver_type1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
df_caregiver_type['concatenated'] = df_caregiver_type1
df_caregiver_type = df_caregiver_type.T
df_caregiver_type

UpdateI'm thinking I'm getting an error due to the first row of the full code. It's from a separate, but related question: pandas: sort each column individually

更新我想由于完整代码的第一行而出现错误。它来自一个单独但相关的问题：pandas：单独对每一列进行排序

Answer 1

回答by CT Zhu

For your full dataset, change the last step from df['concatenated'] = df_1to df['concatenated'] = df_1.valueswill solve the issue, I think it a bug and I am very sure I have seen it in SO before.

对于您的完整数据集，将最后一步从更改df['concatenated'] = df_1为df['concatenated'] = df_1.values将解决问题，我认为这是一个错误，我很确定我之前在 SO 中见过它。

Or just: df['concatenated'] = [':'.join(word for word in row if word is not np.nan) for row in rows]

要不就： df['concatenated'] = [':'.join(word for word in row if word is not np.nan) for row in rows]

Answer 2

回答by Vor

>>> d = {'Col_1' : pd.Series(['A', 'A', 'A', 'B']),
...      'Col_2' : pd.Series(['B', 'C', 'B', 'D']),
...      'Col_3' : pd.Series([np.nan, 'D', 'C', np.nan]),
...      'Col_4' : pd.Series([np.nan, np.nan, 'D', np.nan]),
...      'Col_5' : pd.Series([np.nan, np.nan, 'E', np.nan]),}
>>> df = pd.DataFrame(d)
>>> 
>>> rows = df.values
>>> df_1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
>>> 
>>> df['concatenated'] = df_1[0]
>>> df
  Col_1 Col_2 Col_3 Col_4 Col_5 concatenated
0     A     B   NaN   NaN   NaN          A:B
1     A     C     D   NaN   NaN        A:C:D
2     A     B     C     D     E    A:B:C:D:E
3     B     D   NaN   NaN   NaN          B:D
>>>

Answer 3

回答by gobrewers14

>>> df = df.join(df_1)
>>> df = df.rename(columns = {0:'concatenated'})

pandas 熊猫-向数据帧添加一个系列会导致出现 NaN 值

提问by DataSwede

回答by CT Zhu

回答by Vor

回答by gobrewers14

相关推荐

最近更新

标签

pandas 熊猫-向数据帧添加一个系列会导致出现 NaN 值

提问by DataSwede

回答by CT Zhu

回答by Vor

回答by gobrewers14

相关推荐

Python Pandas MemoryError

pandas 如何在熊猫时间序列中基于 5 分钟的间隔创建组 ID？

pandas Python - 熊猫 - 将系列附加到空白数据帧中

无法在 Pandas 0.14.0 中查询局部变量

相关推荐

最近更新

标签