Python 如何将 Numpy 数组转换为 Panda DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53816008/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert Numpy array to Panda DataFrame
提问by Yannick
I have a Numpy array that looks like this:
我有一个 Numpy 数组,如下所示:
[400.31865662]
[401.18514808]
[404.84015554]
[405.14682194]
[405.67735105]
[273.90969447]
[274.0894528]
When I try to convert it to a Panda Dataframe with the following code
当我尝试使用以下代码将其转换为 Panda Dataframe 时
y = pd.DataFrame(data)
print(y)
I get the following output when printing it. Why do I get all those zéros?
打印时我得到以下输出。为什么我得到所有这些零?
0
0 400.318657
0
0 401.185148
0
0 404.840156
0
0 405.146822
0
0 405.677351
0
0 273.909694
0
0 274.089453
I would like to get a single column dataframe which looks like that:
我想获得一个看起来像这样的单列数据框:
400.31865662
401.18514808
404.84015554
405.14682194
405.67735105
273.90969447
274.0894528
回答by Dani Mesejo
You could flattenthe numpy array:
您可以展平numpy 数组:
import numpy as np
import pandas as pd
data = [[400.31865662],
[401.18514808],
[404.84015554],
[405.14682194],
[405.67735105],
[273.90969447],
[274.0894528]]
arr = np.array(data)
df = pd.DataFrame(data=arr.flatten())
print(df)
Output
输出
0
0 400.318657
1 401.185148
2 404.840156
3 405.146822
4 405.677351
5 273.909694
6 274.089453
回答by Yannick
I just figured out my mistake. (data) was a list of arrays:
我刚刚发现我的错误。(data) 是一个数组列表:
[array([400.0290173]), array([400.02253235]), array([404.00252113]), array([403.99466754]), array([403.98681395]), array([271.97896036]), array([271.97110677])]
So I used np.vstack(data)to concatenate it
所以我用来np.vstack(data)连接它
conc = np.vstack(data)
[[400.0290173 ]
[400.02253235]
[404.00252113]
[403.99466754]
[403.98681395]
[271.97896036]
[271.97110677]]
Then I convert the concatened array into a Pandas Dataframe by using the
然后我使用
newdf = pd.DataFrame(conc)
0
0 400.029017
1 400.022532
2 404.002521
3 403.994668
4 403.986814
5 271.978960
6 271.971107
Et voilà!
等等!
回答by akshayk07
There is another way, which isn't mentioned in the other answers. If you have a NumPy array which is essentially a row vector (or column vector) i.e. shape like (n, ), then you could do the following :
还有另一种方式,其他答案中没有提到。如果您有一个 NumPy 数组,它本质上是一个行向量(或列向量),即形状像(n, ),那么您可以执行以下操作:
# sample array
x = np.zeros((20))
# empty dataframe
df = pd.DataFrame()
# add the array to df as a column
df['column_name'] = x
This way you can add multiple arrays as separate columns.
通过这种方式,您可以将多个数组添加为单独的列。
回答by Nicolas Gervais
Since I assume the many visitors of this post aren't here for OP's specific and un-reproducible issue, here's a general answer:
由于我认为这篇文章的许多访问者不是为了 OP 的特定且不可重现的问题而来到这里的,因此这里有一个通用的答案:
df = pd.DataFrame(array)
Here's an example. The strength of pandasis to be nice for the eye (like Excel), so it's important to use column names.
这是一个例子。的优点pandas是美观(如 Excel),因此使用列名很重要。
import numpy as np
import pandas as pd
array = np.random.rand(5, 5)
array([[0.723, 0.177, 0.659, 0.573, 0.476],
[0.77 , 0.311, 0.533, 0.415, 0.552],
[0.349, 0.768, 0.859, 0.273, 0.425],
[0.367, 0.601, 0.875, 0.109, 0.398],
[0.452, 0.836, 0.31 , 0.727, 0.303]])
columns = [f'col_{num}' for num in range(5)]
index = [f'index_{num}' for num in range(5)]
Here's where the magic happens:
这就是魔法发生的地方:
df = pd.DataFrame(array, columns=columns, index=index)
col_0 col_1 col_2 col_3 col_4
index_0 0.722791 0.177427 0.659204 0.572826 0.476485
index_1 0.770118 0.311444 0.532899 0.415371 0.551828
index_2 0.348923 0.768362 0.858841 0.273221 0.424684
index_3 0.366940 0.600784 0.875214 0.108818 0.397671
index_4 0.451682 0.836315 0.310480 0.727409 0.302597

