将二维数组放入 Pandas 系列

Question

提问by zemekeneng

I have a 2D Numpy array that I would like to put in a pandas Series (not a DataFrame):

我有一个 2D Numpy 数组，我想将其放入 Pandas 系列（不是 DataFrame）中：

>>> import pandas as pd
>>> import numpy as np
>>> a = np.zeros((5, 2))
>>> a
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

But this throws an error:

但这会引发错误：

>>> s = pd.Series(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 227, in __init__
    raise_cast_failure=True)
  File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 2920, in _sanitize_array
    raise Exception('Data must be 1-dimensional')
Exception: Data must be 1-dimensional

It is possible with a hack:

hack 是可能的：

>>> s = pd.Series(map(lambda x:[x], a)).apply(lambda x:x[0])
>>> s
0    [0.0, 0.0]
1    [0.0, 0.0]
2    [0.0, 0.0]
3    [0.0, 0.0]
4    [0.0, 0.0]

Is there a better way?

有没有更好的办法？

Answer 1

采纳答案by bpachev

Well, you can use the numpy.ndarray.tolistfunction, like so:

好吧，您可以使用该numpy.ndarray.tolist功能，如下所示：

>>> a = np.zeros((5,2))
>>> a
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])
>>> a.tolist()
[[0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0]]
>>> pd.Series(a.tolist())
0    [0.0, 0.0]
1    [0.0, 0.0]
2    [0.0, 0.0]
3    [0.0, 0.0]
4    [0.0, 0.0]
dtype: object

EDIT:

编辑：

A faster way to accomplish a similar result is to simply do pd.Series(list(a)). This will make a Series of numpy arrays instead of Python lists, so should be faster than a.tolistwhich returns a list of Python lists.

实现类似结果的更快方法是简单地执行pd.Series(list(a)). 这将生成一系列 numpy 数组而不是 Python 列表，因此应该比a.tolist返回 Python 列表的列表更快。

Answer 2

回答by Merlin

 pd.Series(list(a))

is consistently slower than

始终比

pd.Series(a.tolist())

tested 20,000,000 -- 500,000 rows

测试了 20,000,000 -- 500,000 行

a = np.ones((500000,2))

showing only 1,000,000 rows:

仅显示 1,000,000 行：

%timeit pd.Series(list(a))
1 loop, best of 3: 301 ms per loop

%timeit pd.Series(a.tolist())
1 loop, best of 3: 261 ms per loop

将二维数组放入 Pandas 系列

提问by zemekeneng

采纳答案by bpachev

回答by Merlin

相关推荐

最近更新

标签

将二维数组放入 Pandas 系列

提问by zemekeneng

采纳答案by bpachev

回答by Merlin

相关推荐

pandas 将数据帧输出到 json 数组

Pandas - 关于应用功能缓慢的解释

在 Pandas 中使用 .notnull() 时正确的语法是什么？

如何在 Pandas 中按降序对两列进行排序？

相关推荐

最近更新

标签