在python pandas中将多个列值合并为一列

Question

提问by sequence_hard

I have a pandas data frame like this:

我有一个像这样的熊猫数据框：

   Column1  Column2  Column3  Column4  Column5
 0    a        1        2        3        4
 1    a        3        4        5
 2    b        6        7        8
 3    c        7        7

What I want to do now is getting a new dataframe containing Column1 and a new columnA. This columnA should contain all values from columns 2 -(to) n (where n is the number of columns from Column2 to the end of the row) like this:

我现在想要做的是获取一个包含 Column1 和一个新 columnA 的新数据框。此 columnA 应包含从第 2 列到（到）n（其中 n 是从 Column2 到行尾的列数）中的所有值，如下所示：

  Column1  ColumnA
0   a      1,2,3,4
1   a      3,4,5
2   b      6,7,8
3   c      7,7

How could I best approach this issue? Any advice would be helpful. Thanks in advance!

我怎样才能最好地解决这个问题？任何意见将是有益的。提前致谢！

Answer 1

采纳答案by EdChum

You can call applypass axis=1to applyrow-wise, then convert the dtype to strand join:

您可以按行调用applypass ，然后将 dtype 转换为and ：axis=1applystrjoin

In [153]:
df['ColumnA'] = df[df.columns[1:]].apply(
    lambda x: ','.join(x.dropna().astype(str)),
    axis=1
)
df

Out[153]:
  Column1  Column2  Column3  Column4  Column5  ColumnA
0       a        1        2        3        4  1,2,3,4
1       a        3        4        5      NaN    3,4,5
2       b        6        7        8      NaN    6,7,8
3       c        7        7      NaN      NaN      7,7

Here I call dropnato get rid of the NaN, however we need to cast again to intso we don't end up with floats as str.

在这里，我呼吁dropna摆脱NaN，但是我们需要再次int强制转换为，这样我们就不会以浮点数作为 str 结束。

Answer 2

回答by Amin Salgado

I propose to use .assign

我建议使用 .assign

df2 = df.assign(ColumnA = df.Column2.astype(str) + ', ' + \
  df.Column3.astype(str) + ', ' df.Column4.astype(str) + ', ' \
  df.Column4.astype(str) + ', ' df.Column5.astype(str))

it's simple, maybe long but it worked for me

这很简单，也许很长，但对我有用

Answer 3

回答by Om Prakash

If you have lot of columns say - 1000 columns in dataframe and you want to merge few columns based on particular column namee.g. -Column2in question and arbitrary no. of columns after that column (e.g. here 3 columns after 'Column2inclusive of Column2as OP asked).

如果您有很多列说 - 数据框中的 1000 列，并且您想基于particular column name例如 -Column2有问题和任意否合并几列。该列之后的列数（例如，此处'Column2包含Column2OP 要求的3 列之后）。

We can get position of column using .get_loc()- as answered here

我们可以使用.get_loc()-获取列的位置- 正如这里所回答的

source_col_loc = df.columns.get_loc('Column2') # column position starts from 0

df['ColumnA'] = df.iloc[:,source_col_loc+1:source_col_loc+4].apply(
    lambda x: ",".join(x.astype(str)), axis=1)

df

Column1  Column2  Column3  Column4  Column5  ColumnA
0       a        1        2        3        4  1,2,3,4
1       a        3        4        5      NaN    3,4,5
2       b        6        7        8      NaN    6,7,8
3       c        7        7      NaN      NaN      7,7

To remove NaN, use .dropna()or .fillna()

要删除NaN，使用.dropna()或.fillna()

Hope it helps!

希望能帮助到你！

在python pandas中将多个列值合并为一列

提问by sequence_hard

采纳答案by EdChum

回答by Amin Salgado

回答by Om Prakash

相关推荐

最近更新

标签

在python pandas中将多个列值合并为一列

提问by sequence_hard

采纳答案by EdChum

回答by Amin Salgado

回答by Om Prakash

相关推荐

Python pyaudio - “聆听”直到检测到语音，然后录制到 .wav 文件

我什么时候应该在 Python 中使用类？

python - os.getenv 和 os.environ 看不到我的 bash shell 的环境变量

python中的lambda可以迭代dict吗？

相关推荐

最近更新

标签