pandas “ValueError: 标签 ['timestamp'] 未包含在轴中”错误

Question

提问by avaj

I have this code ,i want to remove the column 'timestamp' from the file :u.databut can't.It shows the error
"ValueError: labels ['timestamp'] not contained in axis" How can i correct it

我有这个代码，我想从文件中删除“时间戳”列：u.data但不能。它显示错误
“ValueError：标签 ['timestamp'] 未包含在轴中”我该如何纠正它

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
plt.rc("font", size=14)
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.cross_validation import KFold
from sklearn.cross_validation import train_test_split



data = pd.read_table('u.data')
data.columns=['userID', 'itemID','rating', 'timestamp']
data.drop('timestamp', axis=1)


N = len(data)
print data.shape
print list(data.columns)
print data.head(10)

Answer 1

采纳答案by Aditya

One of the biggest problem that one faces and that undergoes unnoticed is that in the u.data file while inserting headers the separation should be exactly the same as the separation between a row of data. For example if a tab is used to separate a tuple then you should not use spaces.

In your u.data file add headers and separate them exactly with as many whitespaces as were used between the items of a row. PS: Use sublime text, notepad/notepad++ does not work sometimes.

人们面临的最大问题之一是，在 u.data 文件中插入标题时，分隔应与一行数据之间的分隔完全相同。例如，如果使用制表符分隔元组，则不应使用空格。

在您的 u.data 文件中，添加标题并使用与行的项目之间使用的空格数完全分隔它们。PS：使用 sublime text，notepad/notepad++ 有时不起作用。

Answer 2

回答by bakkal

"ValueError: labels ['timestamp'] not contained in axis"

“ValueError: 标签 ['timestamp'] 未包含在轴中”

You don't have headers in the file, so the way you loaded it you got a dfwhere the column names are the first rows of the data. You tried to access colunm timestampwhich doesn't exist.

您在文件中没有标题，因此在加载它的方式df中，列名是数据的第一行。您试图访问timestamp不存在的列。

Your u.datadoesn't have headers in it

你u.data没有标题

$head u.data                   
196 242 3   881250949
186 302 3   891717742

So working with column names isn't going to be possible unless add the headers. You can add the headers to the file u.data, e.g. I opened it in a text editor and added the line a b c timestampat the top of it (this seems to be a tab-separated file, so be careful when added the header not to use spaces, else it breaks the format)

因此，除非添加标题，否则无法使用列名。您可以将标题添加到文件中u.data，例如我在文本编辑器中打开它并a b c timestamp在其顶部添加了一行（这似乎是一个制表符分隔的文件，因此添加标题时要小心不要使用空格，否则它打破了格式）

$head u.data                   
a   b   c   timestamp
196 242 3   881250949
186 302 3   891717742

Now your code works and data.columnsreturns

现在您的代码工作并data.columns返回

Index([u'a', u'b', u'c', u'timestamp'], dtype='object')

And the rest of the trace of your working code is now

你的工作代码的其余踪迹现在是

(100000, 4) # the shape
['a', 'b', 'c', 'timestamp'] # the columns
     a    b  c  timestamp # the df
0  196  242  3  881250949
1  186  302  3  891717742
2   22  377  1  878887116
3  244   51  2  880606923
4  166  346  1  886397596
5  298  474  4  884182806
6  115  265  2  881171488
7  253  465  5  891628467
8  305  451  3  886324817
9    6   86  3  883603013

If you don't want to add headers

如果您不想添加标题

Or you can drop the column 'timestamp' using it's index (presumably 3), we can do this using df.ixbelow it selects all rows, columns index 0 to index 2, thus dropping the column with index 3

或者您可以使用它的索引（大概是 3）删除列“时间戳”，我们可以使用df.ix它在下面选择所有行，列索引 0 到索引 2，从而删除索引为 3 的列

data.ix[:, 0:2]

Answer 3

回答by MaxU

i would do it this way:

我会这样做：

data = pd.read_table('u.data', header=None,
                     names=['userID', 'itemID','rating', 'timestamp'],
                     usecols=['userID', 'itemID','rating']
)

Check:

查看：

In [589]: data.head()
Out[589]:
   userID  itemID  rating
0     196     242       3
1     186     302       3
2      22     377       1
3     244      51       2
4     166     346       1

pandas “ValueError: 标签 ['timestamp'] 未包含在轴中”错误

提问by avaj

采纳答案by Aditya

回答by bakkal

回答by MaxU

相关推荐

最近更新

标签

pandas “ValueError: 标签 ['timestamp'] 未包含在轴中”错误

提问by avaj

采纳答案by Aditya

回答by bakkal

回答by MaxU

相关推荐

Pandas read_csv dtype 指定除一列之外的所有列

使用索引值访问 Pandas Data Frame 行

无法执行 Python Pandas set_value

pandas 如何用 1 替换数据帧的所有非 NaN 条目，用 0 替换所有 NaN

相关推荐

最近更新

标签