Python 如何将熊猫系列或索引转换为 Numpy 数组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17241004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:50:17  来源:igfitidea点击:

How do I convert a pandas Series or index to a Numpy array?

pythonpandas

提问by ericmjl

Do you know how to get the index or column of a DataFrame as a NumPy array or python list?

您知道如何将 DataFrame 的索引或列作为 NumPy 数组或 Python 列表获取吗?

采纳答案by Andy Hayden

To get a NumPy array, you should use the valuesattribute:

要获取 NumPy 数组,您应该使用以下values属性:

In [1]: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c']); df
   A  B
a  1  4
b  2  5
c  3  6

In [2]: df.index.values
Out[2]: array(['a', 'b', 'c'], dtype=object)

This accesses how the data is already stored, so there's no need for a conversion.
Note: This attribute is also available for many other pandas' objects.

这访问了数据的存储方式,因此不需要转换。
注意:此属性也可用于许多其他熊猫的对象。

In [3]: df['A'].values
Out[3]: Out[16]: array([1, 2, 3])


To get the index as a list, call tolist:

要将索引作为列表获取,请调用tolist

In [4]: df.index.tolist()
Out[4]: ['a', 'b', 'c']

And similarly, for columns.

同样,对于列。

回答by bdiamante

You can use df.indexto access the index object and then get the values in a list using df.index.tolist(). Similarly, you can use df['col'].tolist()for Series.

您可以使用df.index访问索引对象,然后使用 获取列表中的值df.index.tolist()。同样,您可以使用df['col'].tolist()系列。

回答by yemu

Since pandas v0.13 you can also use get_values:

从 pandas v0.13 开始,您还可以使用get_values

df.index.get_values()

回答by gg349

If you are dealing with a multi-index dataframe, you may be interested in extracting only the column of one name of the multi-index. You can do this as

如果您正在处理多索引数据帧,您可能只对提取多索引的一个名称的列感兴趣。你可以这样做

df.index.get_level_values('name_sub_index')

and of course name_sub_indexmust be an element of the FrozenListdf.index.names

并且当然name_sub_index必须是FrozenListdf.index.names

回答by Sarvagya Gupta

I converted the pandas dataframeto listand then used the basic list.index(). Something like this:

我转换熊猫dataframelist,然后使用的碱性list.index()。像这样的东西:

dd = list(zone[0]) #Where zone[0] is some specific column of the table
idx = dd.index(filename[i])

You have you index value as idx.

你有你的索引值作为idx.

回答by cs95

pandas >= 0.24

熊猫 >= 0.24

Deprecate your usage of .valuesin favour of these methods!

弃用您的 ,.values以支持这些方法!

From v0.24.0 onwards, we will have two brand spanking new, preferred methods for obtaining NumPy arrays from Index, Series, and DataFrameobjects: they are to_numpy(), and .array. Regarding usage, the docs mention:

从v0.24.0开始,我们将有两个崭新的品牌,从获得与NumPy阵列的优选方法IndexSeriesDataFrame对象:他们是to_numpy().array。关于使用,文档提到:

We haven't removed or deprecated Series.valuesor DataFrame.values, but we highly recommend and using .arrayor .to_numpy()instead.

我们没有删除或弃用Series.valuesor DataFrame.values,但我们强烈推荐并使用.arrayor .to_numpy()代替。

See this section of the v0.24.0 release notesfor more information.

有关更多信息,请参阅v0.24.0 发行说明的这一部分



to_numpy()Method

to_numpy()方法

df.index.to_numpy()
# array(['a', 'b'], dtype=object)

df['A'].to_numpy()
#  array([1, 4])

By default, a view is returned. Any modifications made will affect the original.

默认返回一个视图。任何修改都会影响原作。

v = df.index.to_numpy()
v[0] = -1

df
    A  B
-1  1  2
b   4  5

If you need a copy instead, use to_numpy(copy=True);

如果您需要副本,请使用to_numpy(copy=True);

v = df.index.to_numpy(copy=True)
v[-1] = -123

df
   A  B
a  1  2
b  4  5

Note that this function also works for DataFrames(while .arraydoes not).

请注意,此函数也适用于DataFrames(而不适用于.array)。



arrayAttribute
This attribute returns an ExtensionArrayobject that backs the Index/Series.

array属性
此属性返回ExtensionArray支持索引/系列的对象。

pd.__version__
# '0.24.0rc1'

# Setup.
df = pd.DataFrame([[1, 2], [4, 5]], columns=['A', 'B'], index=['a', 'b'])
df

   A  B
a  1  2
b  4  5

df.index.array    
# <PandasArray>
# ['a', 'b']
# Length: 2, dtype: object

df['A'].array
# <PandasArray>
# [1, 4]
# Length: 2, dtype: int64

From here, it is possible to get a list using list:

从这里,可以使用list以下方法获取列表:

list(df.index.array)
# ['a', 'b']

list(df['A'].array)
# [1, 4]

or, just directly call .tolist():

或者,直接调用.tolist()

df.index.tolist()
# ['a', 'b']

df['A'].tolist()
# [1, 4]

Regarding what is returned, the docs mention,

关于返回的内容,文档提到,

For Seriesand Indexes backed by normal NumPy arrays, Series.arraywill return a new arrays.PandasArray, which is a thin (no-copy) wrapper around a numpy.ndarray. arrays.PandasArrayisn't especially useful on its own, but it does provide the same interface as any extension array defined in pandas or by a third-party library.

ForSeriesIndexes 由普通 NumPy 数组支持,Series.array将返回一个 new arrays.PandasArray,它是围绕numpy.ndarray. arrays.PandasArray本身并不是特别有用,但它确实提供了与 Pandas 或第三方库中定义的任何扩展数组相同的接口。

So, to summarise, .arraywill return either

所以,总而言之,.array要么返回

  1. The existing ExtensionArraybacking the Index/Series, or
  2. If there is a NumPy array backing the series, a new ExtensionArrayobject is created as a thin wrapper over the underlying array.
  1. 现有ExtensionArray支持指数/系列,或
  2. 如果有支持该系列的 NumPy 数组,ExtensionArray则会创建一个新对象作为底层数组的薄包装器。


Rationale for adding TWO new methods
These functions were added as a result of discussions under two GitHub issues GH19954and GH23623.

添加两个新方法的理由
这些功能是在两个 GitHub 问题GH19954GH23623下讨论的结果而添加的

Specifically, the docs mention the rationale:

具体来说,文档提到了基本原理:

[...] with .valuesit was unclear whether the returned value would be the actual array, some transformation of it, or one of pandas custom arrays (like Categorical). For example, with PeriodIndex, .valuesgenerates a new ndarrayof period objects each time. [...]

[...].values不清楚返回的值是实际数组、它的某种转换还是熊猫自定义数组之一(如Categorical)。例如,使用PeriodIndex,每次都会.values生成一个新ndarray的周期对象。[...]

These two functions aim to improve the consistency of the API, which is a major step in the right direction.

这两个函数旨在提高 API 的一致性,这是朝着正确方向迈出的重要一步。

Lastly, .valueswill not be deprecated in the current version, but I expect this may happen at some point in the future, so I would urge users to migrate towards the newer API, as soon as you can.

最后,.values不会在当前版本中被弃用,但我预计这可能会在未来的某个时候发生,所以我会敦促用户尽快迁移到更新的 API。

回答by Kumar Shubham

Below is a simple way to convert dataframe column into numpy array.

下面是将数据框列转换为 numpy 数组的简单方法。

df = pd.DataFrame(somedict) 
ytrain = df['label']
ytrain_numpy = np.array([x for x in ytrain['label']])

ytrain_numpy is a numpy array.

ytrain_numpy 是一个 numpy 数组。

I tried with to.numpy()but it gave me the below error: TypeError: no supported conversion for types: (dtype('O'),)while doing Binary Relevance classfication using Linear SVC. to.numpy() was converting the dataFrame into numpy array but the inner element's data type was list because of which the above error was observed.

我尝试过,to.numpy()但它给了我以下错误: TypeError: no supported conversion for types: (dtype('O'),)同时使用线性 SVC 进行二进制相关性分类。to.numpy() 正在将数据帧转换为 numpy 数组,但内部元素的数据类型是列表,因此观察到上述错误。

回答by Jon R

A more recent way to do this is to use the .to_numpy() function.

最近的一种方法是使用 .to_numpy() 函数。

If I have a dataframe with a column 'price', I can convert it as follows:

如果我有一个带有“价格”列的数据框,我可以按如下方式转换它:

priceArray = df['price'].to_numpy()

You can also pass the data type, such as float or object, as an argument of the function

您还可以将数据类型(例如 float 或 object)作为函数的参数传递