Pandas,如何将多列组合成一个数组列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48011404/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:58:34  来源:igfitidea点击:

Pandas, how to combine multiple columns into an array column

pythonpandasdataframe

提问by Gnefihz Deng

I need to put a combined column as the concat of all values of the row.

我需要将一个组合列作为该行所有值的连接。

Source:

来源:

pd.DataFrame(data={
    'a' : [1,2,3],
    'b' : [2,3,4]
})

Target:

目标:

pd.DataFrame(data={
    'a' : [1,2,3],
    'b' : [2,3,4],
    'combine' : [[1,2],[2,3],[3,4]]
})

Current solution:

当前解决方案:

test['combine'] = test[['a','b']].apply(lambda x: pd.Series([x.values]), axis=1)

Issues: I actually have many columns, it seems taking too long to run. Is it a better way.

问题:我实际上有很多列,运行时间似乎太长了。有没有更好的办法。

回答by cs95

df

   a  b
0  1  2
1  2  3
2  3  4

If you want to add a column of lists as a single column, you'll need to call the .valuesattribute, convert it to a nested list, and assign it back -

如果要将一列列表添加为单个列,则需要调用该.values属性,将其转换为嵌套列表,然后将其分配回 -

df['combine'] = df.values.tolist()

Or,

或者,

df['combine'] = df[['a', 'b']].values.tolist()
df
   a  b combine
0  1  2  [1, 2]
1  2  3  [2, 3]
2  3  4  [3, 4]

Note that just assigning the .valuesresult directly does not work, as pandasspecial casesnumpy arrays, leading to undesirable outcomes -

请注意,仅.values直接分配结果是行不通的,因为numpy 数组的pandas特殊情况会导致不良结果 -

df['combine'] = df[['a', 'b']].values

ValueError: Wrong number of items passed 2, placement implies 1


A couple of notes -

一些注意事项 -

  • try not to use apply/transformas much as possible. It is only a convenience function meant to hide the application of a loop, and is slow, offering no performance/vectorization benefits whatosever

  • keeping columns of `objects offers no performance gains as far as pandas is concerned, so unless the goal is to display data, try to avoid it.

  • 尽量不要使用apply/ transform。它只是一个方便的函数,旨在隐藏循环的应用程序,而且速度很慢,没有任何性能/矢量化优势

  • 就 Pandas 而言,保留 `objects 列不会带来性能提升,所以除非目标是显示数据,否则尽量避免它。