从 Pandas 数据帧创建 numpy 数组

Question

提问by nav

import pandas as pd
import numpy as np
df = pd.read_csv('~/test.txt')
list(df.columns.values)

I get the following output :

我得到以下输出：

['time', 'Res_fs1', 'angle1', 'Res_fs2', 'angle2', 'Res_ps1', 'Force1', 
'Res_ps2', 'Force2', 'object']

when i try to create a numppy array using Res_fs1,Res_fs2,Res_ps1,Res_ps2

当我尝试使用 Res_fs1,Res_fs2,Res_ps1,Res_ps2 创建一个 numppy 数组时

X=np.array(df['Res_fs1','Res_fs2','Res_ps1','Res_ps2'])

I get this error message saying key error although the keys exist:

尽管密钥存在，但我收到此错误消息，指出密钥错误：

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1969, in 
__getitem__
return self._getitem_column(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1976, in 
_getitem_column
return self._get_item_cache(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1091, 
in _get_item_cache
values = self._data.get(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3211, 
in get
loc = self.items.get_loc(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/index.py", line 1759, in 
get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc 
(pandas/index.c:3979)
File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc 
(pandas/index.c:3843)
File "pandas/hashtable.pyx", line 668, in 
pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)
File "pandas/hashtable.pyx", line 676, in 
pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)
KeyError: ('Res_fs1', 'Res_fs2', 'Res_ps1', 'Res_ps2')

Answer 1

回答by Allen

You can just do:

你可以这样做：

X = df[['Res_fs1','Res_fs2','Res_ps1','Res_ps2']].values

When you subset columns, you need use double square brackets '[[' and ']]'

对列进行子集化时，需要使用双方括号 '[[' 和 ']]'

Answer 2

回答by Ted Petrou

To really understand what is happening you need to know how Python handles the indexing operator (the square brackets). Internally, the square brackets are special syntax for calling an object's __getitem__special method. If the object does not implement the special method you will get an error how the object does not support indexing.

要真正了解正在发生的事情，您需要知道 Python 如何处理索引运算符（方括号）。在内部，方括号是用于调用对象的__getitem__特殊方法的特殊语法。如果对象没有实现特殊的方法，你会得到一个错误，说明对象不支持索引。

When you call df['Res_fs1','Res_fs2','Res_ps1','Res_ps2'], Python interprets the comma separated columns as a tuple. It sends the tuple to the __getitem__special method of the DataFrame.

当您调用时df['Res_fs1','Res_fs2','Res_ps1','Res_ps2']，Python 将逗号分隔的列解释为一个元组。它将元组发送到__getitem__DataFrame的特殊方法。

Internally, this is what gets called.

在内部，这就是所谓的。

df.__getitem__(('Res_fs1','Res_fs2','Res_ps1','Res_ps2'))

Tuples are immutable objects and able to be hashed and therefore are candidates for members of an index. pandas attempts to find a column name that is the exact tuple ('Res_fs1','Res_fs2','Res_ps1','Res_ps2'). Since your DataFrame does not have this column a KeyErroris raised.

元组是不可变的对象并且能够被散列，因此是索引成员的候选者。pandas 尝试查找与 tuple 完全相同的列名('Res_fs1','Res_fs2','Res_ps1','Res_ps2')。由于您的 DataFrame 没有此列，KeyError因此引发了 a 。

When you call df[['Res_fs1','Res_fs2','Res_ps1','Res_ps2']], the __getitem__special method is passed a list. Lists cannot be hashed and therefore unable to be members of the index. pandas therefore takes a completely different path and retrieves all column names that are in the passed list. It will raise a KeyErrorif one of the items in the list is not a column name.

当您调用时df[['Res_fs1','Res_fs2','Res_ps1','Res_ps2']]，__getitem__特殊方法会传递一个列表。列表不能被散列，因此不能成为索引的成员。因此，pandas 采用完全不同的路径并检索传递列表中的所有列名。KeyError如果列表中的一项不是列名，它将引发 a 。

Answer 3

回答by Pratiksha

pandas has an in-built function for this purpose: pandas.DataFrame.as_matrix

pandas 有一个为此目的的内置函数：pandas.DataFrame.as_matrix

DataFrame.as_matrix(columns=None)
Convert the frame to its Numpy-array representation.

DataFrame.as_matrix（列=无）
将帧转换为其 Numpy 数组表示。

从 Pandas 数据帧创建 numpy 数组

提问by nav

回答by Allen

回答by Ted Petrou

回答by Pratiksha

相关推荐

最近更新

标签

从 Pandas 数据帧创建 numpy 数组

提问by nav

回答by Allen

回答by Ted Petrou

回答by Pratiksha

相关推荐

pandas 在 Python 中使用列名构建 DataFrame

pandas 如何通过网络驱动器从 .csv 文件中快速获取最后一行？

pandas 忽略数据框中的 NaN

Pandas 与 SQL 速度

相关推荐

最近更新

标签