pandas 熊猫系列到二维数组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48823400/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:11:49  来源:igfitidea点击:

Pandas series to 2d array

pythonpandas

提问by crayxt

So, I used the answer from Put a 2d Array into a Pandas Seriesto put 2D numpy array to pandas series. In short, it is

因此,我使用Put a 2d Array into a Pandas Series 中的答案2D numpy 数组放入Pandas 系列。简而言之,就是

a = np.zeros((5,2))
s = pd.Series(list(a))

Now, what is the cheapest way to convert that pandas Series back to 2D array? If I try s.values, I get array of arrays with objectdtype.

现在,将Pandas系列转换回二维数组的最便宜的方法是什么?如果我尝试s.values,我会得到带有objectdtype的数组数组。

So far I tried np.vstack(s.values)but it copies the data, of course.

到目前为止,我尝试过,np.vstack(s.values)但它当然会复制数据。

回答by jezrael

I believe you need:

我相信你需要:

a = np.array(s.values.tolist())
print (a)
[[ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]


a = np.zeros((50000,2))
s = pd.Series(list(a))

In [131]: %timeit (np.vstack(s.values))
10 loops, best of 3: 107 ms per loop

In [132]: %timeit (np.array(s.values.tolist()))
10 loops, best of 3: 19.7 ms per loop

In [133]: %timeit (np.array(s.tolist()))
100 loops, best of 3: 19.6 ms per loop

But if transpose difference is small (but caching):

但如果转置差异很小(但缓存):

a = np.zeros((2,50000))
s = pd.Series(list(a))
#print (s)

In [159]: %timeit (np.vstack(s.values))
The slowest run took 23.31 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 55.7 μs per loop

In [160]: %timeit (np.array(s.values.tolist()))
The slowest run took 7.20 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 49.8 μs per loop

In [161]: %timeit (np.array(s.tolist()))
The slowest run took 7.31 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 62.6 μs per loop