从 Pandas 数据帧创建 numpy 数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44792308/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating numpy array from pandas dataframe
提问by nav
import pandas as pd
import numpy as np
df = pd.read_csv('~/test.txt')
list(df.columns.values)
I get the following output :
我得到以下输出:
['time', 'Res_fs1', 'angle1', 'Res_fs2', 'angle2', 'Res_ps1', 'Force1',
'Res_ps2', 'Force2', 'object']
when i try to create a numppy array using Res_fs1,Res_fs2,Res_ps1,Res_ps2
当我尝试使用 Res_fs1,Res_fs2,Res_ps1,Res_ps2 创建一个 numppy 数组时
X=np.array(df['Res_fs1','Res_fs2','Res_ps1','Res_ps2'])
I get this error message saying key error although the keys exist:
尽管密钥存在,但我收到此错误消息,指出密钥错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1969, in
__getitem__
return self._getitem_column(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1976, in
_getitem_column
return self._get_item_cache(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1091,
in _get_item_cache
values = self._data.get(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3211,
in get
loc = self.items.get_loc(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/index.py", line 1759, in
get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc
(pandas/index.c:3979)
File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc
(pandas/index.c:3843)
File "pandas/hashtable.pyx", line 668, in
pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)
File "pandas/hashtable.pyx", line 676, in
pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)
KeyError: ('Res_fs1', 'Res_fs2', 'Res_ps1', 'Res_ps2')
回答by Allen
You can just do:
你可以这样做:
X = df[['Res_fs1','Res_fs2','Res_ps1','Res_ps2']].values
When you subset columns, you need use double square brackets '[[' and ']]'
对列进行子集化时,需要使用双方括号 '[[' 和 ']]'
回答by Ted Petrou
To really understand what is happening you need to know how Python handles the indexing operator (the square brackets). Internally, the square brackets are special syntax for calling an object's __getitem__
special method. If the object does not implement the special method you will get an error how the object does not support indexing.
要真正了解正在发生的事情,您需要知道 Python 如何处理索引运算符(方括号)。在内部,方括号是用于调用对象的__getitem__
特殊方法的特殊语法。如果对象没有实现特殊的方法,你会得到一个错误,说明对象不支持索引。
When you call df['Res_fs1','Res_fs2','Res_ps1','Res_ps2']
, Python interprets the comma separated columns as a tuple. It sends the tuple to the __getitem__
special method of the DataFrame.
当您调用 时df['Res_fs1','Res_fs2','Res_ps1','Res_ps2']
,Python 将逗号分隔的列解释为一个元组。它将元组发送到__getitem__
DataFrame的特殊方法。
Internally, this is what gets called.
在内部,这就是所谓的。
df.__getitem__(('Res_fs1','Res_fs2','Res_ps1','Res_ps2'))
Tuples are immutable objects and able to be hashed and therefore are candidates for members of an index. pandas attempts to find a column name that is the exact tuple ('Res_fs1','Res_fs2','Res_ps1','Res_ps2')
. Since your DataFrame does not have this column a KeyError
is raised.
元组是不可变的对象并且能够被散列,因此是索引成员的候选者。pandas 尝试查找与 tuple 完全相同的列名('Res_fs1','Res_fs2','Res_ps1','Res_ps2')
。由于您的 DataFrame 没有此列,KeyError
因此引发了 a 。
When you call df[['Res_fs1','Res_fs2','Res_ps1','Res_ps2']]
, the __getitem__
special method is passed a list. Lists cannot be hashed and therefore unable to be members of the index. pandas therefore takes a completely different path and retrieves all column names that are in the passed list. It will raise a KeyError
if one of the items in the list is not a column name.
当您调用 时df[['Res_fs1','Res_fs2','Res_ps1','Res_ps2']]
,__getitem__
特殊方法会传递一个列表。列表不能被散列,因此不能成为索引的成员。因此,pandas 采用完全不同的路径并检索传递列表中的所有列名。KeyError
如果列表中的一项不是列名,它将引发 a 。
回答by Pratiksha
pandas has an in-built function for this purpose: pandas.DataFrame.as_matrix
pandas 有一个为此目的的内置函数:pandas.DataFrame.as_matrix
DataFrame.as_matrix(columns=None)
Convert the frame to its Numpy-array representation.
DataFrame.as_matrix(列=无)
将帧转换为其 Numpy 数组表示。