pandas “ValueError: 标签 ['timestamp'] 未包含在轴中”错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37766030/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"ValueError: labels ['timestamp'] not contained in axis" error
提问by avaj
I have this code ,i want to remove the column 'timestamp' from the file :u.databut can't.It shows the error
"ValueError: labels ['timestamp'] not contained in axis"
How can i correct it
我有这个代码,我想从文件中删除“时间戳”列:u.data但不能。它显示错误
“ValueError:标签 ['timestamp'] 未包含在轴中”我该如何纠正它
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rc("font", size=14)
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.cross_validation import KFold
from sklearn.cross_validation import train_test_split
data = pd.read_table('u.data')
data.columns=['userID', 'itemID','rating', 'timestamp']
data.drop('timestamp', axis=1)
N = len(data)
print data.shape
print list(data.columns)
print data.head(10)
采纳答案by Aditya
One of the biggest problem that one faces and that undergoes unnoticed is that in the u.data file while inserting headers the separation should be exactly the same as the separation between a row of data. For example if a tab is used to separate a tuple then you should not use spaces.
In your u.data file add headers and separate them exactly with as many whitespaces as were used between the items of a row.
PS: Use sublime text, notepad/notepad++ does not work sometimes.
人们面临的最大问题之一是,在 u.data 文件中插入标题时,分隔应与一行数据之间的分隔完全相同。例如,如果使用制表符分隔元组,则不应使用空格。
在您的 u.data 文件中,添加标题并使用与行的项目之间使用的空格数完全分隔它们。PS:使用 sublime text,notepad/notepad++ 有时不起作用。
回答by bakkal
"ValueError: labels ['timestamp'] not contained in axis"
“ValueError: 标签 ['timestamp'] 未包含在轴中”
You don't have headers in the file, so the way you loaded it you got a df
where the column names are the first rows of the data. You tried to access colunm timestamp
which doesn't exist.
您在文件中没有标题,因此在加载它的方式df
中,列名是数据的第一行。您试图访问timestamp
不存在的列。
Your u.data
doesn't have headers in it
你u.data
没有标题
$head u.data
196 242 3 881250949
186 302 3 891717742
So working with column names isn't going to be possible unless add the headers. You can add the headers to the file u.data
, e.g. I opened it in a text editor and added the line a b c timestamp
at the top of it (this seems to be a tab-separated file, so be careful when added the header not to use spaces, else it breaks the format)
因此,除非添加标题,否则无法使用列名。您可以将标题添加到文件中u.data
,例如我在文本编辑器中打开它并a b c timestamp
在其顶部添加了一行(这似乎是一个制表符分隔的文件,因此添加标题时要小心不要使用空格,否则它打破了格式)
$head u.data
a b c timestamp
196 242 3 881250949
186 302 3 891717742
Now your code works and data.columns
returns
现在您的代码工作并data.columns
返回
Index([u'a', u'b', u'c', u'timestamp'], dtype='object')
And the rest of the trace of your working code is now
你的工作代码的其余踪迹现在是
(100000, 4) # the shape
['a', 'b', 'c', 'timestamp'] # the columns
a b c timestamp # the df
0 196 242 3 881250949
1 186 302 3 891717742
2 22 377 1 878887116
3 244 51 2 880606923
4 166 346 1 886397596
5 298 474 4 884182806
6 115 265 2 881171488
7 253 465 5 891628467
8 305 451 3 886324817
9 6 86 3 883603013
If you don't want to add headers
如果您不想添加标题
Or you can drop the column 'timestamp' using it's index (presumably 3), we can do this using df.ix
below it selects all rows, columns index 0 to index 2, thus dropping the column with index 3
或者您可以使用它的索引(大概是 3)删除列“时间戳”,我们可以使用df.ix
它在下面选择所有行,列索引 0 到索引 2,从而删除索引为 3 的列
data.ix[:, 0:2]
回答by MaxU
i would do it this way:
我会这样做:
data = pd.read_table('u.data', header=None,
names=['userID', 'itemID','rating', 'timestamp'],
usecols=['userID', 'itemID','rating']
)
Check:
查看:
In [589]: data.head()
Out[589]:
userID itemID rating
0 196 242 3
1 186 302 3
2 22 377 1
3 244 51 2
4 166 346 1