将二维数组放入 Pandas 系列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38840319/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:46:06  来源:igfitidea点击:

Put a 2d Array into a Pandas Series

pythonpandasnumpy

提问by zemekeneng

I have a 2D Numpy array that I would like to put in a pandas Series (not a DataFrame):

我有一个 2D Numpy 数组,我想将其放入 Pandas 系列(不是 DataFrame)中:

>>> import pandas as pd
>>> import numpy as np
>>> a = np.zeros((5, 2))
>>> a
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

But this throws an error:

但这会引发错误:

>>> s = pd.Series(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 227, in __init__
    raise_cast_failure=True)
  File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 2920, in _sanitize_array
    raise Exception('Data must be 1-dimensional')
Exception: Data must be 1-dimensional

It is possible with a hack:

hack 是可能的:

>>> s = pd.Series(map(lambda x:[x], a)).apply(lambda x:x[0])
>>> s
0    [0.0, 0.0]
1    [0.0, 0.0]
2    [0.0, 0.0]
3    [0.0, 0.0]
4    [0.0, 0.0]

Is there a better way?

有没有更好的办法?

采纳答案by bpachev

Well, you can use the numpy.ndarray.tolistfunction, like so:

好吧,您可以使用该numpy.ndarray.tolist功能,如下所示:

>>> a = np.zeros((5,2))
>>> a
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])
>>> a.tolist()
[[0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0]]
>>> pd.Series(a.tolist())
0    [0.0, 0.0]
1    [0.0, 0.0]
2    [0.0, 0.0]
3    [0.0, 0.0]
4    [0.0, 0.0]
dtype: object

EDIT:

编辑:

A faster way to accomplish a similar result is to simply do pd.Series(list(a)). This will make a Series of numpy arrays instead of Python lists, so should be faster than a.tolistwhich returns a list of Python lists.

实现类似结果的更快方法是简单地执行pd.Series(list(a)). 这将生成一系列 numpy 数组而不是 Python 列表,因此应该比a.tolist返回 Python 列表的列表更快。

回答by Merlin

 pd.Series(list(a))

is consistently slower than

始终比

pd.Series(a.tolist())

tested 20,000,000 -- 500,000 rows

测试了 20,000,000 -- 500,000 行

a = np.ones((500000,2))

showing only 1,000,000 rows:

仅显示 1,000,000 行:

%timeit pd.Series(list(a))
1 loop, best of 3: 301 ms per loop

%timeit pd.Series(a.tolist())
1 loop, best of 3: 261 ms per loop