Pandas,如何将多列组合成一个数组列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48011404/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas, how to combine multiple columns into an array column
提问by Gnefihz Deng
I need to put a combined column as the concat of all values of the row.
我需要将一个组合列作为该行所有值的连接。
Source:
来源:
pd.DataFrame(data={
'a' : [1,2,3],
'b' : [2,3,4]
})
Target:
目标:
pd.DataFrame(data={
'a' : [1,2,3],
'b' : [2,3,4],
'combine' : [[1,2],[2,3],[3,4]]
})
Current solution:
当前解决方案:
test['combine'] = test[['a','b']].apply(lambda x: pd.Series([x.values]), axis=1)
Issues: I actually have many columns, it seems taking too long to run. Is it a better way.
问题:我实际上有很多列,运行时间似乎太长了。有没有更好的办法。
回答by cs95
df
a b
0 1 2
1 2 3
2 3 4
If you want to add a column of lists as a single column, you'll need to call the .values
attribute, convert it to a nested list, and assign it back -
如果要将一列列表添加为单个列,则需要调用该.values
属性,将其转换为嵌套列表,然后将其分配回 -
df['combine'] = df.values.tolist()
Or,
或者,
df['combine'] = df[['a', 'b']].values.tolist()
df
a b combine
0 1 2 [1, 2]
1 2 3 [2, 3]
2 3 4 [3, 4]
Note that just assigning the .values
result directly does not work, as pandas
special casesnumpy arrays, leading to undesirable outcomes -
请注意,仅.values
直接分配结果是行不通的,因为numpy 数组的pandas
特殊情况会导致不良结果 -
df['combine'] = df[['a', 'b']].values
ValueError: Wrong number of items passed 2, placement implies 1
A couple of notes -
一些注意事项 -
try not to use
apply
/transform
as much as possible. It is only a convenience function meant to hide the application of a loop, and is slow, offering no performance/vectorization benefits whatoseverkeeping columns of `objects offers no performance gains as far as pandas is concerned, so unless the goal is to display data, try to avoid it.
尽量不要使用
apply
/transform
。它只是一个方便的函数,旨在隐藏循环的应用程序,而且速度很慢,没有任何性能/矢量化优势就 Pandas 而言,保留 `objects 列不会带来性能提升,所以除非目标是显示数据,否则尽量避免它。