将 Pandas 数据框列值合并到新列中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29782898/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:14:34  来源:igfitidea点击:

Combine Pandas data frame column values into new column

pythonpandasdataframe

提问by EMC

I'm working with Pandas and I have a data frame where we can have one of three values populated:

我正在使用 Pandas,我有一个数据框,我们可以在其中填充三个值之一:

ID_1    ID_2    ID_3
abc     NaN     NaN
NaN     def     NaN
NaN     NaN     ghi
NaN     NaN     jkl
NaN     mno     NaN
pqr     NaN     NaN

And my goal is to combine these three columns into a new columns in my data frame:

我的目标是将这三列组合成我的数据框中的新列:

ID_1    ID_2    ID_3  Combined_ID
abc     NaN     NaN    abc
NaN     def     NaN    def
NaN     NaN     ghi    ghi
NaN     NaN     jkl    jkl
NaN     mno     NaN    mno
pqr     NaN     NaN    pqr

Ideally it would just find whatever not null value exists in columns 1 through 3, but I could also concatenate since we should only have one of the three populated for each row. Thanks.

理想情况下,它只会找到第 1 列到第 3 列中存在的任何非空值,但我也可以连接,因为我们应该只为每一行填充三个中的一个。谢谢。

df_note = pd.read_csv("NoteIds.csv")
df_note['Combined_ID'] = # ID_1 + ID_2 + ID_3

回答by EdChum

You can use the property that summing will concatenate the string values, so you could call fillnaand pass an empty str and the call sumand pass param axis=1to sum row-wise:

您可以使用 summing 将连接字符串值的属性,因此您可以调用fillna并传递一个空 str 和调用sum并传递参数axis=1以按行求和:

In [26]:

df['Combined_ID'] = df.fillna('').sum(axis=1)
df
Out[26]:
  ID_1 ID_2 ID_3 Combined_ID
0  abc  NaN  NaN         abc
1  NaN  def  NaN         def
2  NaN  NaN  ghi         ghi
3  NaN  NaN  jkl         jkl
4  NaN  mno  NaN         mno
5  pqr  NaN  NaN         pqr

If you're only interested in those 3 columns you can just select them:

如果您只对这 3 列感兴趣,则可以选择它们:

In [39]:

df['Combined_ID'] = df[['ID_1','ID_2','ID_3']].fillna('').sum(axis=1)
df
Out[39]:
  ID_1 ID_2 ID_3 Combined_ID
0  abc  NaN  NaN         abc
1  NaN  def  NaN         def
2  NaN  NaN  ghi         ghi
3  NaN  NaN  jkl         jkl
4  NaN  mno  NaN         mno
5  pqr  NaN  NaN         pqr

回答by Zero

Let's assume that there can be more than one non-NaN value per row. Still this should work.

假设每一行可以有多个非 NaN 值。这仍然应该有效。

In [43]: df['Combined_ID'] = df.apply(
                lambda x : ''.join([e for e in x if isinstance(e, basestring)]),
                      axis=1)

For each row, extract string items and join them.

对于每一行,提取字符串项并加入它们。

In [44]: df
Out[44]:
  ID_1 ID_2 ID_3 Combined_ID
0  abc  NaN  NaN         abc
1  NaN  def  NaN         def
2  NaN  NaN  ghi         ghi
3  NaN  NaN  jkl         jkl
4  NaN  mno  NaN         mno
5  pqr  NaN  NaN         pqr

I liked @EdChum's answer and looks more readable.

我喜欢@EdChum 的回答并且看起来更具可读性。

Interestingly, fillna('').sum(axis=1)method is expensive for this smaller data.

有趣的fillna('').sum(axis=1)是,这种较小的数据的方法是昂贵的。

In [45]: %timeit df.fillna('').sum(axis=1)
1000 loops, best of 3: 808 μs per loop

In [46]: %timeit df.apply(lambda x : ''.join([e for e in x if isinstance(e, basestring)]), axis=1)
1000 loops, best of 3: 285 μs per loop

For, ['ID_1','ID_2','ID_3']columns only

对于,['ID_1','ID_2','ID_3']仅列

df[['ID_1','ID_2','ID_3']].apply(lambda_function)

回答by Joe

Another way:

其它的办法:

df['Combined_ID'] = df.ID_1.fillna('') + df.ID_2.fillna('') + df.ID_3.fillna('')

Output:

输出:

  ID_1 ID_2 ID_3 Combined_ID
0  abc  NaN  NaN         abc
1  NaN  def  NaN         def
2  NaN  NaN  ghi         ghi
3  NaN  NaN  jkl         jkl
4  NaN  mno  NaN         mno
5  pqr  NaN  NaN         pqr