pandas “ValueError: 标签 ['timestamp'] 未包含在轴中”错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37766030/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:22:14  来源:igfitidea点击:

"ValueError: labels ['timestamp'] not contained in axis" error

pythonpandasmachine-learningrecommendation-enginedata-science

提问by avaj

I have this code ,i want to remove the column 'timestamp' from the file :u.databut can't.It shows the error
"ValueError: labels ['timestamp'] not contained in axis" How can i correct it

我有这个代码,我想从文件中删除“时间戳”列:u.data但不能。它显示错误
“ValueError:标签 ['timestamp'] 未包含在轴中”我该如何纠正它

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
plt.rc("font", size=14)
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.cross_validation import KFold
from sklearn.cross_validation import train_test_split



data = pd.read_table('u.data')
data.columns=['userID', 'itemID','rating', 'timestamp']
data.drop('timestamp', axis=1)


N = len(data)
print data.shape
print list(data.columns)
print data.head(10)

采纳答案by Aditya

One of the biggest problem that one faces and that undergoes unnoticed is that in the u.data file while inserting headers the separation should be exactly the same as the separation between a row of data. For example if a tab is used to separate a tuple then you should not use spaces.

In your u.data file add headers and separate them exactly with as many whitespaces as were used between the items of a row. PS: Use sublime text, notepad/notepad++ does not work sometimes.

人们面临的最大问题之一是,在 u.data 文件中插入标题时,分隔应与一行数据之间的分隔完全相同。例如,如果使用制表符分隔元组,则不应使用空格。

在您的 u.data 文件中,添加标题并使用与行的项目之间使用的空格数完全分隔它们。PS:使用 sublime text,notepad/notepad++ 有时不起作用。

回答by bakkal

"ValueError: labels ['timestamp'] not contained in axis"

“ValueError: 标签 ['timestamp'] 未包含在轴中”

You don't have headers in the file, so the way you loaded it you got a dfwhere the column names are the first rows of the data. You tried to access colunm timestampwhich doesn't exist.

您在文件中没有标题,因此在加载它的方式df中,列名是数据的第一行。您试图访问timestamp不存在的列。

Your u.datadoesn't have headers in it

u.data没有标题

$head u.data                   
196 242 3   881250949
186 302 3   891717742

So working with column names isn't going to be possible unless add the headers. You can add the headers to the file u.data, e.g. I opened it in a text editor and added the line a b c timestampat the top of it (this seems to be a tab-separated file, so be careful when added the header not to use spaces, else it breaks the format)

因此,除非添加标题,否则无法使用列名。您可以将标题添加到文件中u.data,例如我在文本编辑器中打开它并a b c timestamp在其顶部添加了一行(这似乎是一个制表符分隔的文件,因此添加标题时要小心不要使用空格,否则它打破了格式)

$head u.data                   
a   b   c   timestamp
196 242 3   881250949
186 302 3   891717742

Now your code works and data.columnsreturns

现在您的代码工作并data.columns返回

Index([u'a', u'b', u'c', u'timestamp'], dtype='object')

And the rest of the trace of your working code is now

你的工作代码的其余踪迹现在是

(100000, 4) # the shape
['a', 'b', 'c', 'timestamp'] # the columns
     a    b  c  timestamp # the df
0  196  242  3  881250949
1  186  302  3  891717742
2   22  377  1  878887116
3  244   51  2  880606923
4  166  346  1  886397596
5  298  474  4  884182806
6  115  265  2  881171488
7  253  465  5  891628467
8  305  451  3  886324817
9    6   86  3  883603013

If you don't want to add headers

如果您不想添加标题

Or you can drop the column 'timestamp' using it's index (presumably 3), we can do this using df.ixbelow it selects all rows, columns index 0 to index 2, thus dropping the column with index 3

或者您可以使用它的索引(大概是 3)删除列“时间戳”,我们可以使用df.ix它在下面选择所有行,列索引 0 到索引 2,从而删除索引为 3 的列

data.ix[:, 0:2]

回答by MaxU

i would do it this way:

我会这样做:

data = pd.read_table('u.data', header=None,
                     names=['userID', 'itemID','rating', 'timestamp'],
                     usecols=['userID', 'itemID','rating']
)

Check:

查看:

In [589]: data.head()
Out[589]:
   userID  itemID  rating
0     196     242       3
1     186     302       3
2      22     377       1
3     244      51       2
4     166     346       1