Python 将 Pandas Dataframe 中的 Select Columns 转换为 Numpy Array

Question

提问by Adam_G

I would like to convert everything but the first column of a pandas dataframe into a numpy array. For some reason using the columns=parameter of DataFrame.to_matrix()is not working.

我想将除熊猫数据帧的第一列以外的所有内容转换为 numpy 数组。由于某种原因，使用的columns=参数DataFrame.to_matrix()不起作用。

df:

df：

  viz  a1_count  a1_mean     a1_std
0   n         3        2   0.816497
1   n         0      NaN        NaN 
2   n         2       51  50.000000

I tried X=df.as_matrix(columns=[df[1:]])but this yields an array of all NaNs

我试过了，X=df.as_matrix(columns=[df[1:]])但这会产生一个包含所有NaNs的数组

Answer 1

采纳答案by DSM

The columnsparameter accepts a collection of column names. You're passing a list containing a dataframe with two rows:

该columns参数接受一组列名。您正在传递一个包含两行数据框的列表：

>>> [df[1:]]
[  viz  a1_count  a1_mean  a1_std
1   n         0      NaN     NaN
2   n         2       51      50]
>>> df.as_matrix(columns=[df[1:]])
array([[ nan,  nan],
       [ nan,  nan],
       [ nan,  nan]])

Instead, pass the column names you want:

相反，传递您想要的列名：

>>> df.columns[1:]
Index(['a1_count', 'a1_mean', 'a1_std'], dtype='object')
>>> df.as_matrix(columns=df.columns[1:])
array([[  3.      ,   2.      ,   0.816497],
       [  0.      ,        nan,        nan],
       [  2.      ,  51.      ,  50.      ]])

Answer 2

回答by 176coding

the easy way is the "values" property df.iloc[:,1:].values

简单的方法是“值”属性 df.iloc[:,1:].values

a=df.iloc[:,1:]
b=df.iloc[:,1:].values

print(type(df))
print(type(a))
print(type(b))

so, you can get type

所以，你可以得到类型

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'numpy.ndarray'>

Answer 3

回答by amc

The fastest and easiest way is to use `.as_matrix()`. One short line:

最快和最简单的方法是使用`.as_matrix()`. 一条短线：

df.iloc[:,[1,2,3]].as_matrix()

Gives:

给出：

array([[3, 2, 0.816497],
   [0, 'NaN', 'NaN'],
   [2, 51, 50.0]], dtype=object)

By using indices of the columns, you can use this code for any dataframe with different column names.

通过使用列的索引，您可以将此代码用于具有不同列名的任何数据框。

Here are the steps for your example:

以下是您的示例的步骤：

import pandas as pd
columns = ['viz', 'a1_count', 'a1_mean', 'a1_std']
index = [0,1,2]
vals = {'viz': ['n','n','n'], 'a1_count': [3,0,2], 'a1_mean': [2,'NaN', 51], 'a1_std': [0.816497, 'NaN', 50.000000]}
df = pd.DataFrame(vals, columns=columns, index=index)

Gives:

给出：

   viz  a1_count a1_mean    a1_std
0   n         3       2  0.816497
1   n         0     NaN       NaN
2   n         2      51        50

Then:

然后：

x1 = df.iloc[:,[1,2,3]].as_matrix()

Gives:

给出：

array([[3, 2, 0.816497],
   [0, 'NaN', 'NaN'],
   [2, 51, 50.0]], dtype=object)

Where x1 is numpy.ndarray.

其中 x1 是numpy.ndarray。

Answer 4

回答by amir

The best way for converting to Numpy Array is using '.to_numpy(self, dtype=None, copy=False)'. It is new in version 0.24.0.Refrence

转换为 Numpy 数组的最佳方法是使用 '.to_numpy(self, dtype=None, copy=False)'。它是 0.24.0 版中的新功能。参考

You can also use '.array'.Refrence

您也可以使用“.array”。参考

Pandas .as_matrix deprecated since version 0.23.0.

Pandas .as_matrix 自 0.23.0 版起已弃用。

Answer 5

回答by Suvo

Please use the Pandas to_numpy()method. Below is an example--

请使用Pandasto_numpy()方法。下面是一个例子——

>>> import pandas as pd
>>> df = pd.DataFrame({"A":[1, 2], "B":[3, 4], "C":[5, 6]})
>>> df 
    A  B  C
 0  1  3  5
 1  2  4  6
>>> s_array = df[["A", "B", "C"]].to_numpy()
>>> s_array

array([[1, 3, 5],
   [2, 4, 6]]) 

>>> t_array = df[["B", "C"]].to_numpy() 
>>> print (t_array)

[[3 5]
 [4 6]]

Hope this helps. You can select any number of columns using

希望这可以帮助。您可以使用选择任意数量的列

columns = ['col1', 'col2', 'col3']
df1 = df[columns]

Then apply to_numpy()method.

然后应用to_numpy()方法。

Python 将 Pandas Dataframe 中的 Select Columns 转换为 Numpy Array

提问by Adam_G

采纳答案by DSM

回答by 176coding

回答by amc

The fastest and easiest way is to use `.as_matrix()`. One short line:

最快和最简单的方法是使用`.as_matrix()`. 一条短线：

Gives:

给出：

By using indices of the columns, you can use this code for any dataframe with different column names.

通过使用列的索引，您可以将此代码用于具有不同列名的任何数据框。

回答by amir

回答by Suvo

相关推荐

最近更新

标签

Python 将 Pandas Dataframe 中的 Select Columns 转换为 Numpy Array

提问by Adam_G

采纳答案by DSM

回答by 176coding

回答by amc

The fastest and easiest way is to use .as_matrix(). One short line:

最快和最简单的方法是使用.as_matrix(). 一条短线：

Gives:

给出：

By using indices of the columns, you can use this code for any dataframe with different column names.

通过使用列的索引，您可以将此代码用于具有不同列名的任何数据框。

回答by amir

回答by Suvo

相关推荐

Python 获取 Pandas DataFrame 的名称

Python 如何在 Django 中设置自定义中间件

Python 离线时如何安装 conda 环境？

从文本文件中提取一列 - Python

相关推荐

最近更新

标签

The fastest and easiest way is to use `.as_matrix()`. One short line:

最快和最简单的方法是使用`.as_matrix()`. 一条短线：