在python pandas中将多个列值合并为一列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33098383/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:47:40  来源:igfitidea点击:

Merge multiple column values into one column in python pandas

pythonlistpandasrowdataframe

提问by sequence_hard

I have a pandas data frame like this:

我有一个像这样的熊猫数据框:

   Column1  Column2  Column3  Column4  Column5
 0    a        1        2        3        4
 1    a        3        4        5
 2    b        6        7        8
 3    c        7        7        

What I want to do now is getting a new dataframe containing Column1 and a new columnA. This columnA should contain all values from columns 2 -(to) n (where n is the number of columns from Column2 to the end of the row) like this:

我现在想要做的是获取一个包含 Column1 和一个新 columnA 的新数据框。此 columnA 应包含从第 2 列到(到)n(其中 n 是从 Column2 到行尾的列数)中的所有值,如下所示:

  Column1  ColumnA
0   a      1,2,3,4
1   a      3,4,5
2   b      6,7,8
3   c      7,7

How could I best approach this issue? Any advice would be helpful. Thanks in advance!

我怎样才能最好地解决这个问题?任何意见将是有益的。提前致谢!

采纳答案by EdChum

You can call applypass axis=1to applyrow-wise, then convert the dtype to strand join:

您可以按行调用applypass ,然后将 dtype 转换为and :axis=1applystrjoin

In [153]:
df['ColumnA'] = df[df.columns[1:]].apply(
    lambda x: ','.join(x.dropna().astype(str)),
    axis=1
)
df

Out[153]:
  Column1  Column2  Column3  Column4  Column5  ColumnA
0       a        1        2        3        4  1,2,3,4
1       a        3        4        5      NaN    3,4,5
2       b        6        7        8      NaN    6,7,8
3       c        7        7      NaN      NaN      7,7

Here I call dropnato get rid of the NaN, however we need to cast again to intso we don't end up with floats as str.

在这里,我呼吁dropna摆脱NaN,但是我们需要再次int强制转换为,这样我们就不会以浮点数作为 str 结束。

回答by Amin Salgado

I propose to use .assign

我建议使用 .assign

df2 = df.assign(ColumnA = df.Column2.astype(str) + ', ' + \
  df.Column3.astype(str) + ', ' df.Column4.astype(str) + ', ' \
  df.Column4.astype(str) + ', ' df.Column5.astype(str))

it's simple, maybe long but it worked for me

这很简单,也许很长,但对我有用

回答by Om Prakash

If you have lot of columns say - 1000 columns in dataframe and you want to merge few columns based on particular column namee.g. -Column2in question and arbitrary no. of columns after that column (e.g. here 3 columns after 'Column2inclusive of Column2as OP asked).

如果您有很多列说 - 数据框中的 1000 列,并且您想基于particular column name例如 -Column2有问题和任意否合并几列。该列之后的列数(例如,此处'Column2包含Column2OP 要求的3 列之后)。

We can get position of column using .get_loc()- as answered here

我们可以使用.get_loc()-获取列的位置- 正如这里所回答的

source_col_loc = df.columns.get_loc('Column2') # column position starts from 0

df['ColumnA'] = df.iloc[:,source_col_loc+1:source_col_loc+4].apply(
    lambda x: ",".join(x.astype(str)), axis=1)

df

Column1  Column2  Column3  Column4  Column5  ColumnA
0       a        1        2        3        4  1,2,3,4
1       a        3        4        5      NaN    3,4,5
2       b        6        7        8      NaN    6,7,8
3       c        7        7      NaN      NaN      7,7

To remove NaN, use .dropna()or .fillna()

要删除NaN,使用.dropna().fillna()

Hope it helps!

希望能帮助到你!