Python 将 Pandas Dataframe 中的 Select Columns 转换为 Numpy Array
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31789160/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert Select Columns in Pandas Dataframe to Numpy Array
提问by Adam_G
I would like to convert everything but the first column of a pandas dataframe into a numpy array. For some reason using the columns=
parameter of DataFrame.to_matrix()
is not working.
我想将除熊猫数据帧的第一列以外的所有内容转换为 numpy 数组。由于某种原因,使用的columns=
参数DataFrame.to_matrix()
不起作用。
df:
df:
viz a1_count a1_mean a1_std
0 n 3 2 0.816497
1 n 0 NaN NaN
2 n 2 51 50.000000
I tried X=df.as_matrix(columns=[df[1:]])
but this yields an array of all NaN
s
我试过了,X=df.as_matrix(columns=[df[1:]])
但这会产生一个包含所有NaN
s的数组
采纳答案by DSM
The columns
parameter accepts a collection of column names. You're passing a list containing a dataframe with two rows:
该columns
参数接受一组列名。您正在传递一个包含两行数据框的列表:
>>> [df[1:]]
[ viz a1_count a1_mean a1_std
1 n 0 NaN NaN
2 n 2 51 50]
>>> df.as_matrix(columns=[df[1:]])
array([[ nan, nan],
[ nan, nan],
[ nan, nan]])
Instead, pass the column names you want:
相反,传递您想要的列名:
>>> df.columns[1:]
Index(['a1_count', 'a1_mean', 'a1_std'], dtype='object')
>>> df.as_matrix(columns=df.columns[1:])
array([[ 3. , 2. , 0.816497],
[ 0. , nan, nan],
[ 2. , 51. , 50. ]])
回答by 176coding
the easy way is the "values" property df.iloc[:,1:].values
简单的方法是“值”属性 df.iloc[:,1:].values
a=df.iloc[:,1:]
b=df.iloc[:,1:].values
print(type(df))
print(type(a))
print(type(b))
so, you can get type
所以,你可以得到类型
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'numpy.ndarray'>
回答by amc
The fastest and easiest way is to use .as_matrix()
. One short line:
最快和最简单的方法是使用.as_matrix()
. 一条短线:
df.iloc[:,[1,2,3]].as_matrix()
Gives:
给出:
array([[3, 2, 0.816497],
[0, 'NaN', 'NaN'],
[2, 51, 50.0]], dtype=object)
By using indices of the columns, you can use this code for any dataframe with different column names.
通过使用列的索引,您可以将此代码用于具有不同列名的任何数据框。
Here are the steps for your example:
以下是您的示例的步骤:
import pandas as pd
columns = ['viz', 'a1_count', 'a1_mean', 'a1_std']
index = [0,1,2]
vals = {'viz': ['n','n','n'], 'a1_count': [3,0,2], 'a1_mean': [2,'NaN', 51], 'a1_std': [0.816497, 'NaN', 50.000000]}
df = pd.DataFrame(vals, columns=columns, index=index)
Gives:
给出:
viz a1_count a1_mean a1_std
0 n 3 2 0.816497
1 n 0 NaN NaN
2 n 2 51 50
Then:
然后:
x1 = df.iloc[:,[1,2,3]].as_matrix()
Gives:
给出:
array([[3, 2, 0.816497],
[0, 'NaN', 'NaN'],
[2, 51, 50.0]], dtype=object)
Where x1 is numpy.ndarray
.
其中 x1 是numpy.ndarray
。
回答by amir
The best way for converting to Numpy Array is using '.to_numpy(self, dtype=None, copy=False)'. It is new in version 0.24.0.Refrence
转换为 Numpy 数组的最佳方法是使用 '.to_numpy(self, dtype=None, copy=False)'。它是 0.24.0 版中的新功能。参考
You can also use '.array'.Refrence
您也可以使用“.array”。参考
Pandas .as_matrix deprecated since version 0.23.0.
Pandas .as_matrix 自 0.23.0 版起已弃用。
回答by Suvo
Please use the Pandas to_numpy()
method. Below is an example--
请使用Pandasto_numpy()
方法。下面是一个例子——
>>> import pandas as pd
>>> df = pd.DataFrame({"A":[1, 2], "B":[3, 4], "C":[5, 6]})
>>> df
A B C
0 1 3 5
1 2 4 6
>>> s_array = df[["A", "B", "C"]].to_numpy()
>>> s_array
array([[1, 3, 5],
[2, 4, 6]])
>>> t_array = df[["B", "C"]].to_numpy()
>>> print (t_array)
[[3 5]
[4 6]]
Hope this helps. You can select any number of columns using
希望这可以帮助。您可以使用选择任意数量的列
columns = ['col1', 'col2', 'col3']
df1 = df[columns]
Then apply to_numpy()
method.
然后应用to_numpy()
方法。