Python 如何将 Numpy 数组转换为 Panda DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53816008/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert Numpy array to Panda DataFrame
提问by Yannick
I have a Numpy array that looks like this:
我有一个 Numpy 数组,如下所示:
[400.31865662]
[401.18514808]
[404.84015554]
[405.14682194]
[405.67735105]
[273.90969447]
[274.0894528]
When I try to convert it to a Panda Dataframe with the following code
当我尝试使用以下代码将其转换为 Panda Dataframe 时
y = pd.DataFrame(data)
print(y)
I get the following output when printing it. Why do I get all those zéros?
打印时我得到以下输出。为什么我得到所有这些零?
0
0 400.318657
0
0 401.185148
0
0 404.840156
0
0 405.146822
0
0 405.677351
0
0 273.909694
0
0 274.089453
I would like to get a single column dataframe which looks like that:
我想获得一个看起来像这样的单列数据框:
400.31865662
401.18514808
404.84015554
405.14682194
405.67735105
273.90969447
274.0894528
回答by Dani Mesejo
You could flattenthe numpy array:
您可以展平numpy 数组:
import numpy as np
import pandas as pd
data = [[400.31865662],
[401.18514808],
[404.84015554],
[405.14682194],
[405.67735105],
[273.90969447],
[274.0894528]]
arr = np.array(data)
df = pd.DataFrame(data=arr.flatten())
print(df)
Output
输出
0
0 400.318657
1 401.185148
2 404.840156
3 405.146822
4 405.677351
5 273.909694
6 274.089453
回答by Yannick
I just figured out my mistake. (data) was a list of arrays:
我刚刚发现我的错误。(data) 是一个数组列表:
[array([400.0290173]), array([400.02253235]), array([404.00252113]), array([403.99466754]), array([403.98681395]), array([271.97896036]), array([271.97110677])]
So I used np.vstack(data)
to concatenate it
所以我用来np.vstack(data)
连接它
conc = np.vstack(data)
[[400.0290173 ]
[400.02253235]
[404.00252113]
[403.99466754]
[403.98681395]
[271.97896036]
[271.97110677]]
Then I convert the concatened array into a Pandas Dataframe by using the
然后我使用
newdf = pd.DataFrame(conc)
0
0 400.029017
1 400.022532
2 404.002521
3 403.994668
4 403.986814
5 271.978960
6 271.971107
Et voilà!
等等!
回答by akshayk07
There is another way, which isn't mentioned in the other answers. If you have a NumPy array which is essentially a row vector (or column vector) i.e. shape like (n, )
, then you could do the following :
还有另一种方式,其他答案中没有提到。如果您有一个 NumPy 数组,它本质上是一个行向量(或列向量),即形状像(n, )
,那么您可以执行以下操作:
# sample array
x = np.zeros((20))
# empty dataframe
df = pd.DataFrame()
# add the array to df as a column
df['column_name'] = x
This way you can add multiple arrays as separate columns.
通过这种方式,您可以将多个数组添加为单独的列。
回答by Nicolas Gervais
Since I assume the many visitors of this post aren't here for OP's specific and un-reproducible issue, here's a general answer:
由于我认为这篇文章的许多访问者不是为了 OP 的特定且不可重现的问题而来到这里的,因此这里有一个通用的答案:
df = pd.DataFrame(array)
Here's an example. The strength of pandas
is to be nice for the eye (like Excel), so it's important to use column names.
这是一个例子。的优点pandas
是美观(如 Excel),因此使用列名很重要。
import numpy as np
import pandas as pd
array = np.random.rand(5, 5)
array([[0.723, 0.177, 0.659, 0.573, 0.476],
[0.77 , 0.311, 0.533, 0.415, 0.552],
[0.349, 0.768, 0.859, 0.273, 0.425],
[0.367, 0.601, 0.875, 0.109, 0.398],
[0.452, 0.836, 0.31 , 0.727, 0.303]])
columns = [f'col_{num}' for num in range(5)]
index = [f'index_{num}' for num in range(5)]
Here's where the magic happens:
这就是魔法发生的地方:
df = pd.DataFrame(array, columns=columns, index=index)
col_0 col_1 col_2 col_3 col_4
index_0 0.722791 0.177427 0.659204 0.572826 0.476485
index_1 0.770118 0.311444 0.532899 0.415371 0.551828
index_2 0.348923 0.768362 0.858841 0.273221 0.424684
index_3 0.366940 0.600784 0.875214 0.108818 0.397671
index_4 0.451682 0.836315 0.310480 0.727409 0.302597