pandas 'DataFrame' 对象没有属性 'col_name'

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46169022/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:26:09  来源:igfitidea点击:

'DataFrame' object has no attribute 'col_name'

pythonpandas

提问by Vash

I read a csv file using

我使用读取了一个 csv 文件

x = pd.read_table('path to csv')

and I can see a row-wise comma-separated list of the data values on printing x which is fine. But when I try to access any column using x.col1, it gives an error :

我可以在打印 x 时看到一个按行逗号分隔的数据值列表,这很好。但是当我尝试使用 x.col1 访问任何列时,它给出了一个错误:

**AttributeError: 'DataFrame' object has no attribute 'col1'**

I also tried doing :

我也试过这样做:

y = DataFrame(x)

and retrieve the column via y but no luck. However, the command x.columns works. Just can't figure what is the problem here.

并通过 y 检索列但没有运气。但是,命令 x.columns 有效。只是无法弄清楚这里有什么问题。

Please help!!

请帮忙!!

采纳答案by jezrael

I think read_tablehave default separator tab, so is necessary define separator parameter:

我认为read_table有默认的分隔符选项卡,所以有必要定义分隔符参数:

x = pd.read_table('path to csv', sep=',')

Or use read_csvwith default separator ,, so sep: can be omit.

read_csv与默认分隔符一起使用,,因此sep: 可以省略。

x = pd.read_csv('path to csv')

回答by Qianru Zhou

I have the same issue, and have checked all the answers (including the first answer), but none work for me, until I ran

我有同样的问题,并检查了所有答案(包括第一个答案),但没有一个对我有用,直到我跑了

 print(dataset.columns.tolist())

then I found the devil:

然后我找到了恶魔:

['\xef\xbb\xbfLabel', 'blabla','blabla']

Notice the first element of the row, it should be 'Label' (by the way, it seems Pandas do not welcome 'Label' as your name of label, so I changed to something else later.)

请注意该行的第一个元素,它应该是 'Label'(顺便说一下,Pandas 似乎不欢迎 'Label' 作为您的标签名称,所以我稍后更改为其他内容。)

I did a little digging, and found

我做了一点挖掘,发现

the \x actually means that the value is hexadecimal, which is a Byte Order Mark, indicating that the text is Unicode.

Why does it matter to us? You cannot assume the files you read are clean. They might contain extra symbols like this that can throw your scripts off.

\x其实就是表示这个值是十六进制的,是一个Byte Order Mark,表示文本是Unicode。

为什么对我们很重要?您不能假设您读取的文件是干净的。它们可能包含这样的额外符号,可以让您的脚本失效。

in this article

这篇文章中

And I tried many ways to get rid of it, and, the most convenient way is... to add an empty ',' before the first column ( I am using csv, that is to add an empty column before the first column in your dataset for the junk only). Thus, the columns turns out to be:

我尝试了很多方法来摆脱它,而且,最方便的方法是......在第一列之前添加一个空的“,”(我使用的是csv,即在第一列之前添加一个空列您的数据集仅用于垃圾)。因此,列结果是:

['\xef\xbb\xbf', 'Label', 'blabla', 'blabla']

Problem solved!

问题解决了!

回答by Mohamed Ali JAMAOUI

Try to strip the potential whitespaces around the column name with this:

尝试使用以下方法去除列名周围的潜在空格:

x.columns = [col.strip() for col in x.columns.tolist()]

Or as suggested in the documenation hereand highlighted in @jezrael's answer:

或者按照此处的文档中的建议并在@jezrael 的回答中突出显示:

x.columns = x.columns.str.strip() 

Then, you will be able to access columns with x.col1..x.coln. Also be aware that column names are case sensitive.

然后,您将能够使用x.col1..x.coln. 还要注意列名区分大小写。

Example:

例子:

>>> import pandas as pd 
>>> df = pd.DataFrame([[1,2],[3,4]], columns=[' col1', 'col2 '])
>>> df
    col1  col2 
0      1      2
1      3      4
>>> df.col1
Traceback (most recent call last):
..    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'col1'
>>> df.col2 
Traceback (most recent call last):
...    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'col2'
>>> df.columns = [col.strip() for col in df.columns.tolist()]
>>> df.col1
0    1
1    3
Name: col1, dtype: int64
>>> df.col2 
0    2
1    4
Name: col2, dtype: int64
>>>