Pandas:从 2D numpy 数组创建一个数据框并保留它们的顺序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41873198/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: create a dataframe from 2D numpy arrays preserving their sequential order
提问by FaCoffee
Say that you have 3 numpy arrays: lat
, lon
, val
:
假设您有 3 个 numpy 数组:lat
, lon
, val
:
import numpy as np
lat=np.array([[10, 20, 30],
[20, 11, 33],
[21, 20, 10]])
lon=np.array([[100, 102, 103],
[105, 101, 102],
[100, 102, 103]])
val=np.array([[17, 2, 11],
[86, 84, 1],
[9, 5, 10]])
And say that you want to create a pandas
dataframe where df.columns = ['lat', 'lon', 'val']
, but since each value in lat
is associated with both a long
and a val
quantity, you want them to appear in the same row.
并假设您要创建一个pandas
数据框 where df.columns = ['lat', 'lon', 'val']
,但由于中的每个值lat
都与 along
和val
数量相关联,因此您希望它们出现在同一行中。
Also, you want the row-wise order of each column to follow the positions in each array, so to obtain the following dataframe:
此外,您希望每列的行顺序跟随每个数组中的位置,以便获得以下数据帧:
lat lon val
0 10 100 17
1 20 102 2
2 30 103 11
3 20 105 86
... ... ... ...
So basically the first row in the dataframe stores the "first" quantities of each array, and so forth. How to do this?
所以基本上数据帧中的第一行存储每个数组的“第一个”数量,依此类推。这该怎么做?
I couldn't find a pythonic way of doing this, so any help will be much appreciated.
我找不到这样做的pythonic方式,所以任何帮助将不胜感激。
回答by jezrael
I think the simplest approach is flattening the arrays by using ravel:
我认为最简单的方法是使用ravel将数组展平:
df = pd.DataFrame({'lat': lat.ravel(), 'long': long.ravel(), 'val': val.ravel()})
print (df)
lat long val
0 10 100 17
1 20 102 2
2 30 103 11
3 20 105 86
4 11 101 84
5 33 102 1
6 21 100 9
7 20 102 5
8 10 103 10
回答by Divakar
Something like this -
像这样的东西——
# Create stacked array
In [100]: arr = np.column_stack((lat.ravel(),long.ravel(),val.ravel()))
# Create dataframe from it and assign column names
In [101]: pd.DataFrame(arr,columns=('lat','long','val'))
Out[101]:
lat long val
0 10 100 17
1 20 102 2
2 30 103 11
3 20 105 86
4 11 101 84
5 33 102 1
6 21 100 9
7 20 102 5
8 10 103 10
Runtime test -
运行时测试 -
In [103]: lat = np.random.rand(30,30)
In [104]: long = np.random.rand(30,30)
In [105]: val = np.random.rand(30,30)
In [106]: %timeit pd.DataFrame({'lat': lat.ravel(), 'long': long.ravel(), 'val': val.ravel()})
1000 loops, best of 3: 452 μs per loop
In [107]: arr = np.column_stack((lat.ravel(),long.ravel(),val.ravel()))
In [108]: %timeit np.column_stack((lat.ravel(),long.ravel(),val.ravel()))
100000 loops, best of 3: 12.4 μs per loop
In [109]: %timeit pd.DataFrame(arr,columns=('lat','long','val'))
1000 loops, best of 3: 217 μs per loop
回答by Divakar
No need to ravel first. You can just stack and go.
没必要先扯淡。你可以堆叠然后去。
lat, long, val = np.arange(5), np.arange(5), np.arange(5)
arr = np.stack((lat, long, val), axis=1)
cols = ['lat', 'long', 'val']
df = pd.DataFrame(arr, columns=cols)
lat long val
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4