将 Pandas 数据框列值合并到新列中

Question

提问by EMC

I'm working with Pandas and I have a data frame where we can have one of three values populated:

我正在使用 Pandas，我有一个数据框，我们可以在其中填充三个值之一：

ID_1    ID_2    ID_3
abc     NaN     NaN
NaN     def     NaN
NaN     NaN     ghi
NaN     NaN     jkl
NaN     mno     NaN
pqr     NaN     NaN

And my goal is to combine these three columns into a new columns in my data frame:

我的目标是将这三列组合成我的数据框中的新列：

ID_1    ID_2    ID_3  Combined_ID
abc     NaN     NaN    abc
NaN     def     NaN    def
NaN     NaN     ghi    ghi
NaN     NaN     jkl    jkl
NaN     mno     NaN    mno
pqr     NaN     NaN    pqr

Ideally it would just find whatever not null value exists in columns 1 through 3, but I could also concatenate since we should only have one of the three populated for each row. Thanks.

理想情况下，它只会找到第 1 列到第 3 列中存在的任何非空值，但我也可以连接，因为我们应该只为每一行填充三个中的一个。谢谢。

df_note = pd.read_csv("NoteIds.csv")
df_note['Combined_ID'] = # ID_1 + ID_2 + ID_3

Answer 1

回答by EdChum

You can use the property that summing will concatenate the string values, so you could call fillnaand pass an empty str and the call sumand pass param axis=1to sum row-wise:

您可以使用 summing 将连接字符串值的属性，因此您可以调用fillna并传递一个空 str 和调用sum并传递参数axis=1以按行求和：

In [26]:

df['Combined_ID'] = df.fillna('').sum(axis=1)
df
Out[26]:
  ID_1 ID_2 ID_3 Combined_ID
0  abc  NaN  NaN         abc
1  NaN  def  NaN         def
2  NaN  NaN  ghi         ghi
3  NaN  NaN  jkl         jkl
4  NaN  mno  NaN         mno
5  pqr  NaN  NaN         pqr

If you're only interested in those 3 columns you can just select them:

如果您只对这 3 列感兴趣，则可以选择它们：

In [39]:

df['Combined_ID'] = df[['ID_1','ID_2','ID_3']].fillna('').sum(axis=1)
df
Out[39]:
  ID_1 ID_2 ID_3 Combined_ID
0  abc  NaN  NaN         abc
1  NaN  def  NaN         def
2  NaN  NaN  ghi         ghi
3  NaN  NaN  jkl         jkl
4  NaN  mno  NaN         mno
5  pqr  NaN  NaN         pqr

Answer 2

回答by Zero

Let's assume that there can be more than one non-NaN value per row. Still this should work.

假设每一行可以有多个非 NaN 值。这仍然应该有效。

In [43]: df['Combined_ID'] = df.apply(
                lambda x : ''.join([e for e in x if isinstance(e, basestring)]),
                      axis=1)

For each row, extract string items and join them.

对于每一行，提取字符串项并加入它们。

In [44]: df
Out[44]:
  ID_1 ID_2 ID_3 Combined_ID
0  abc  NaN  NaN         abc
1  NaN  def  NaN         def
2  NaN  NaN  ghi         ghi
3  NaN  NaN  jkl         jkl
4  NaN  mno  NaN         mno
5  pqr  NaN  NaN         pqr

I liked @EdChum's answer and looks more readable.

我喜欢@EdChum 的回答并且看起来更具可读性。

Interestingly, fillna('').sum(axis=1)method is expensive for this smaller data.

有趣的fillna('').sum(axis=1)是，这种较小的数据的方法是昂贵的。

In [45]: %timeit df.fillna('').sum(axis=1)
1000 loops, best of 3: 808 μs per loop

In [46]: %timeit df.apply(lambda x : ''.join([e for e in x if isinstance(e, basestring)]), axis=1)
1000 loops, best of 3: 285 μs per loop

For, ['ID_1','ID_2','ID_3']columns only

对于，['ID_1','ID_2','ID_3']仅列

df[['ID_1','ID_2','ID_3']].apply(lambda_function)

Answer 3

回答by Joe

Another way:

其它的办法：

df['Combined_ID'] = df.ID_1.fillna('') + df.ID_2.fillna('') + df.ID_3.fillna('')

Output:

输出：

  ID_1 ID_2 ID_3 Combined_ID
0  abc  NaN  NaN         abc
1  NaN  def  NaN         def
2  NaN  NaN  ghi         ghi
3  NaN  NaN  jkl         jkl
4  NaN  mno  NaN         mno
5  pqr  NaN  NaN         pqr

将 Pandas 数据框列值合并到新列中

提问by EMC

回答by EdChum

回答by Zero

回答by Joe

相关推荐

最近更新

标签

将 Pandas 数据框列值合并到新列中

提问by EMC

回答by EdChum

回答by Zero

回答by Joe

相关推荐

pandas 使用 Seaborn FacetGrid 绘制相关热图

Pandas：如何将具有多个值的单元格转换为多行？

从 Pandas df 更新数据库中的现有行

Pandas 由布尔`loc` 和随后的`iloc` 索引

相关推荐

最近更新

标签