将二维数组放入 Pandas 系列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38840319/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Put a 2d Array into a Pandas Series
提问by zemekeneng
I have a 2D Numpy array that I would like to put in a pandas Series (not a DataFrame):
我有一个 2D Numpy 数组,我想将其放入 Pandas 系列(不是 DataFrame)中:
>>> import pandas as pd
>>> import numpy as np
>>> a = np.zeros((5, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
But this throws an error:
但这会引发错误:
>>> s = pd.Series(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 227, in __init__
raise_cast_failure=True)
File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 2920, in _sanitize_array
raise Exception('Data must be 1-dimensional')
Exception: Data must be 1-dimensional
It is possible with a hack:
hack 是可能的:
>>> s = pd.Series(map(lambda x:[x], a)).apply(lambda x:x[0])
>>> s
0 [0.0, 0.0]
1 [0.0, 0.0]
2 [0.0, 0.0]
3 [0.0, 0.0]
4 [0.0, 0.0]
Is there a better way?
有没有更好的办法?
采纳答案by bpachev
Well, you can use the numpy.ndarray.tolist
function, like so:
好吧,您可以使用该numpy.ndarray.tolist
功能,如下所示:
>>> a = np.zeros((5,2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a.tolist()
[[0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0]]
>>> pd.Series(a.tolist())
0 [0.0, 0.0]
1 [0.0, 0.0]
2 [0.0, 0.0]
3 [0.0, 0.0]
4 [0.0, 0.0]
dtype: object
EDIT:
编辑:
A faster way to accomplish a similar result is to simply do pd.Series(list(a))
. This will make a Series of numpy arrays instead of Python lists, so should be faster than a.tolist
which returns a list of Python lists.
实现类似结果的更快方法是简单地执行pd.Series(list(a))
. 这将生成一系列 numpy 数组而不是 Python 列表,因此应该比a.tolist
返回 Python 列表的列表更快。
回答by Merlin
pd.Series(list(a))
is consistently slower than
始终比
pd.Series(a.tolist())
tested 20,000,000 -- 500,000 rows
测试了 20,000,000 -- 500,000 行
a = np.ones((500000,2))
showing only 1,000,000 rows:
仅显示 1,000,000 行:
%timeit pd.Series(list(a))
1 loop, best of 3: 301 ms per loop
%timeit pd.Series(a.tolist())
1 loop, best of 3: 261 ms per loop