将 Pandas 数据框列值合并到新列中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29782898/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Combine Pandas data frame column values into new column
提问by EMC
I'm working with Pandas and I have a data frame where we can have one of three values populated:
我正在使用 Pandas,我有一个数据框,我们可以在其中填充三个值之一:
ID_1 ID_2 ID_3
abc NaN NaN
NaN def NaN
NaN NaN ghi
NaN NaN jkl
NaN mno NaN
pqr NaN NaN
And my goal is to combine these three columns into a new columns in my data frame:
我的目标是将这三列组合成我的数据框中的新列:
ID_1 ID_2 ID_3 Combined_ID
abc NaN NaN abc
NaN def NaN def
NaN NaN ghi ghi
NaN NaN jkl jkl
NaN mno NaN mno
pqr NaN NaN pqr
Ideally it would just find whatever not null value exists in columns 1 through 3, but I could also concatenate since we should only have one of the three populated for each row. Thanks.
理想情况下,它只会找到第 1 列到第 3 列中存在的任何非空值,但我也可以连接,因为我们应该只为每一行填充三个中的一个。谢谢。
df_note = pd.read_csv("NoteIds.csv")
df_note['Combined_ID'] = # ID_1 + ID_2 + ID_3
回答by EdChum
You can use the property that summing will concatenate the string values, so you could call fillnaand pass an empty str and the call sumand pass param axis=1to sum row-wise:
您可以使用 summing 将连接字符串值的属性,因此您可以调用fillna并传递一个空 str 和调用sum并传递参数axis=1以按行求和:
In [26]:
df['Combined_ID'] = df.fillna('').sum(axis=1)
df
Out[26]:
ID_1 ID_2 ID_3 Combined_ID
0 abc NaN NaN abc
1 NaN def NaN def
2 NaN NaN ghi ghi
3 NaN NaN jkl jkl
4 NaN mno NaN mno
5 pqr NaN NaN pqr
If you're only interested in those 3 columns you can just select them:
如果您只对这 3 列感兴趣,则可以选择它们:
In [39]:
df['Combined_ID'] = df[['ID_1','ID_2','ID_3']].fillna('').sum(axis=1)
df
Out[39]:
ID_1 ID_2 ID_3 Combined_ID
0 abc NaN NaN abc
1 NaN def NaN def
2 NaN NaN ghi ghi
3 NaN NaN jkl jkl
4 NaN mno NaN mno
5 pqr NaN NaN pqr
回答by Zero
Let's assume that there can be more than one non-NaN value per row. Still this should work.
假设每一行可以有多个非 NaN 值。这仍然应该有效。
In [43]: df['Combined_ID'] = df.apply(
lambda x : ''.join([e for e in x if isinstance(e, basestring)]),
axis=1)
For each row, extract string items and join them.
对于每一行,提取字符串项并加入它们。
In [44]: df
Out[44]:
ID_1 ID_2 ID_3 Combined_ID
0 abc NaN NaN abc
1 NaN def NaN def
2 NaN NaN ghi ghi
3 NaN NaN jkl jkl
4 NaN mno NaN mno
5 pqr NaN NaN pqr
I liked @EdChum's answer and looks more readable.
我喜欢@EdChum 的回答并且看起来更具可读性。
Interestingly, fillna('').sum(axis=1)method is expensive for this smaller data.
有趣的fillna('').sum(axis=1)是,这种较小的数据的方法是昂贵的。
In [45]: %timeit df.fillna('').sum(axis=1)
1000 loops, best of 3: 808 μs per loop
In [46]: %timeit df.apply(lambda x : ''.join([e for e in x if isinstance(e, basestring)]), axis=1)
1000 loops, best of 3: 285 μs per loop
For, ['ID_1','ID_2','ID_3']columns only
对于,['ID_1','ID_2','ID_3']仅列
df[['ID_1','ID_2','ID_3']].apply(lambda_function)
回答by Joe
Another way:
其它的办法:
df['Combined_ID'] = df.ID_1.fillna('') + df.ID_2.fillna('') + df.ID_3.fillna('')
Output:
输出:
ID_1 ID_2 ID_3 Combined_ID
0 abc NaN NaN abc
1 NaN def NaN def
2 NaN NaN ghi ghi
3 NaN NaN jkl jkl
4 NaN mno NaN mno
5 pqr NaN NaN pqr

