在python pandas中将多个列值合并为一列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33098383/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merge multiple column values into one column in python pandas
提问by sequence_hard
I have a pandas data frame like this:
我有一个像这样的熊猫数据框:
Column1 Column2 Column3 Column4 Column5
0 a 1 2 3 4
1 a 3 4 5
2 b 6 7 8
3 c 7 7
What I want to do now is getting a new dataframe containing Column1 and a new columnA. This columnA should contain all values from columns 2 -(to) n (where n is the number of columns from Column2 to the end of the row) like this:
我现在想要做的是获取一个包含 Column1 和一个新 columnA 的新数据框。此 columnA 应包含从第 2 列到(到)n(其中 n 是从 Column2 到行尾的列数)中的所有值,如下所示:
Column1 ColumnA
0 a 1,2,3,4
1 a 3,4,5
2 b 6,7,8
3 c 7,7
How could I best approach this issue? Any advice would be helpful. Thanks in advance!
我怎样才能最好地解决这个问题?任何意见将是有益的。提前致谢!
采纳答案by EdChum
You can call apply
pass axis=1
to apply
row-wise, then convert the dtype to str
and join
:
您可以按行调用apply
pass ,然后将 dtype 转换为and :axis=1
apply
str
join
In [153]:
df['ColumnA'] = df[df.columns[1:]].apply(
lambda x: ','.join(x.dropna().astype(str)),
axis=1
)
df
Out[153]:
Column1 Column2 Column3 Column4 Column5 ColumnA
0 a 1 2 3 4 1,2,3,4
1 a 3 4 5 NaN 3,4,5
2 b 6 7 8 NaN 6,7,8
3 c 7 7 NaN NaN 7,7
Here I call dropna
to get rid of the NaN
, however we need to cast again to int
so we don't end up with floats as str.
在这里,我呼吁dropna
摆脱NaN
,但是我们需要再次int
强制转换为,这样我们就不会以浮点数作为 str 结束。
回答by Amin Salgado
I propose to use .assign
我建议使用 .assign
df2 = df.assign(ColumnA = df.Column2.astype(str) + ', ' + \
df.Column3.astype(str) + ', ' df.Column4.astype(str) + ', ' \
df.Column4.astype(str) + ', ' df.Column5.astype(str))
it's simple, maybe long but it worked for me
这很简单,也许很长,但对我有用
回答by Om Prakash
If you have lot of columns say - 1000 columns in dataframe and you want to merge few columns based on particular column name
e.g. -Column2
in question and arbitrary no. of columns after that column (e.g. here 3 columns after 'Column2
inclusive of Column2
as OP asked).
如果您有很多列说 - 数据框中的 1000 列,并且您想基于particular column name
例如 -Column2
有问题和任意否合并几列。该列之后的列数(例如,此处'Column2
包含Column2
OP 要求的3 列之后)。
We can get position of column using .get_loc()
- as answered here
我们可以使用.get_loc()
-获取列的位置- 正如这里所回答的
source_col_loc = df.columns.get_loc('Column2') # column position starts from 0
df['ColumnA'] = df.iloc[:,source_col_loc+1:source_col_loc+4].apply(
lambda x: ",".join(x.astype(str)), axis=1)
df
Column1 Column2 Column3 Column4 Column5 ColumnA
0 a 1 2 3 4 1,2,3,4
1 a 3 4 5 NaN 3,4,5
2 b 6 7 8 NaN 6,7,8
3 c 7 7 NaN NaN 7,7
To remove NaN
, use .dropna()
or .fillna()
Hope it helps!
希望能帮助到你!