如何将 numpy 矩阵添加为 Pandas 数据框的新列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51848161/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to add numpy matrix as new columns for pandas dataframe?
提问by Booley
I have a NxM
dataframe and a NxL
numpy matrix. I'd like to add the matrix to the dataframe to create L new columns by simply appending the columns and rows the same order they appear. I tried merge()
and join()
, but I end up with errors:
我有一个 NxM
数据框和一个NxL
numpy 矩阵。我想通过简单地将列和行附加到它们出现的相同顺序来将矩阵添加到数据框中以创建 L 个新列。我试过merge()
and join()
,但最终出现错误:
assign() keywords must be strings
assign() keywords must be strings
and
和
columns overlap but no suffix specified
columns overlap but no suffix specified
respectively.
分别。
Is there a way I can add a numpy matrix as dataframe columns?
有没有办法可以添加一个 numpy 矩阵作为数据框列?
回答by sacuL
You can turn the matrix into a datframe and use concat
with axis=1
:
你可以把矩阵划分成datframe和使用concat
有axis=1
:
For example, given a dataframe df
and a numpy array mat
:
例如,给定一个数据框df
和一个 numpy 数组mat
:
>>> df
a b
0 5 5
1 0 7
2 1 0
3 0 4
4 6 4
>>> mat
array([[0.44926098, 0.29567859, 0.60728561],
[0.32180566, 0.32499134, 0.94950085],
[0.64958125, 0.00566706, 0.56473627],
[0.17357589, 0.71053224, 0.17854188],
[0.38348102, 0.12440952, 0.90359566]])
You can do:
你可以做:
>>> pd.concat([df, pd.DataFrame(mat)], axis=1)
a b 0 1 2
0 5 5 0.449261 0.295679 0.607286
1 0 7 0.321806 0.324991 0.949501
2 1 0 0.649581 0.005667 0.564736
3 0 4 0.173576 0.710532 0.178542
4 6 4 0.383481 0.124410 0.903596
回答by user3483203
Setup
设置
df = pd.DataFrame({'a': [5,0,1,0,6], 'b': [5,7,0,4,4]})
mat = np.random.rand(5,3)
Using join
:
使用join
:
df.join(pd.DataFrame(mat))
a b 0 1 2
0 5 5 0.884061 0.803747 0.727161
1 0 7 0.464009 0.447346 0.171881
2 1 0 0.353604 0.912781 0.199477
3 0 4 0.466095 0.136218 0.405766
4 6 4 0.764678 0.874614 0.310778
If there is the chance of overlapping column names, simply supply a suffix:
如果有可能重叠列名,只需提供一个后缀:
df = pd.DataFrame({0: [5,0,1,0,6], 1: [5,7,0,4,4]})
mat = np.random.rand(5,3)
df.join(pd.DataFrame(mat), rsuffix='_')
0 1 0_ 1_ 2
0 5 5 0.783722 0.976951 0.563798
1 0 7 0.946070 0.391593 0.273339
2 1 0 0.710195 0.827352 0.839212
3 0 4 0.528824 0.625430 0.465386
4 6 4 0.848423 0.467256 0.962953