Python numpy中的“有1列而不是......”错误

Question

提问by user3466132

I'm working on the following code for performing Random Forest Classification on train and test sets;

我正在编写以下代码，用于在训练集和测试集上执行随机森林分类；

from sklearn.ensemble import RandomForestClassifier
from numpy import genfromtxt, savetxt

def main():
    dataset = genfromtxt(open('filepath','r'), delimiter=' ', dtype='f8')   
    target = [x[0] for x in dataset]
    train = [x[1:] for x in dataset]
    test = genfromtxt(open('filepath','r'), delimiter=' ', dtype='f8')

    rf = RandomForestClassifier(n_estimators=100)
    rf.fit(train, target)
    predicted_probs = [[index + 1, x[1]] for index, x in enumerate(rf.predict_proba(test))]

    savetxt('filepath', predicted_probs, delimiter=',', fmt='%d,%f', 
            header='Id,PredictedProbability', comments = '')

if __name__=="__main__":
    main()

However I get the following error on execution;

但是我在执行时遇到以下错误；

---->      dataset = genfromtxt(open('C:/Users/Saurabh/Desktop/pgm/Cora/a_train.csv','r'), delimiter='', dtype='f8')

ValueError: Some errors were detected !
    Line #88 (got 1435 columns instead of 1434)
    Line #93 (got 1435 columns instead of 1434)
    Line #164 (got 1435 columns instead of 1434)
    Line #169 (got 1435 columns instead of 1434)
    Line #524 (got 1435 columns instead of 1434)
...
...
...

Any suggestions as to how avoid it?? Thanks.

关于如何避免它的任何建议？谢谢。

Answer 1

回答by user545424

You have too many columns in one of your rows. For example

您的某一行中有太多列。例如

>>> import numpy as np
>>> from StringIO import StringIO
>>> s = """
... 1 2 3 4
... 1 2 3 4 5
... """
>>> np.genfromtxt(StringIO(s),delimiter=" ")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/site-packages/numpy/lib/npyio.py", line 1654, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #2 (got 5 columns instead of 4)

Answer 2

回答by atomh33ls

genfromtxtwill give this error if the number of columns is unequal.

genfromtxt如果列数不相等，则会出现此错误。

I can think of 3 ways around it:

我可以想到 3 种方法：

1. Use the usecolsparameter

1.使用usecols参数

np.genfromtxt('yourfile.txt',delimiter=',',usecols=np.arange(0,1434))

However - this may mean that you lose some data (where rows are longer than 1434 columns) - whether or not that matters is down to you.

但是 - 这可能意味着您会丢失一些数据（其中行长于 1434 列） - 是否重要取决于您。

2. Adjust your input data file so that it has an equal number of columns.

2. 调整您的输入数据文件，使其具有相同的列数。

3. Use something other than genfromtxt:

3. 使用其他东西 genfromtxt:

.............like this

……像这样

Answer 3

回答by Jonathon D

I had this error. The cause was a single entry in my data that had a space. This caused it to see it as an extra row. Make sure all spacing is consistent throughout all the data.

我有这个错误。原因是我的数据中的单个条目有一个空格。这导致它将其视为额外的一行。确保所有数据的所有间距都是一致的。

Answer 4

回答by tadf2

It seems like the header that includes the column names have 1 more column than the data itself (1435 columns on header vs. 1434 on data).

似乎包含列名称的标题比数据本身多 1 列（标题上的 1435 列与数据上的 1434 列）。

You could either:

你可以：

1) Eliminate 1 column from the header that doesn't make sense with data

1）从标题中删除对数据没有意义的1列

OR

或者

2) Use the skip header from genfromtxt() for example, np.genfromtxt('myfile', skip_header=*how many lines to skip*, delimiter=' ')more information found in the documentation.

2) 例如，使用 genfromtxt() 中的跳过标题np.genfromtxt('myfile', skip_header=*how many lines to skip*, delimiter=' ')，在文档中可以找到更多信息。

Answer 5

回答by KLaz

I also had this error when I was also trying to load a text dataset with genfromtext and do text classification with Keras.

当我还尝试使用 genfromtext 加载文本数据集并使用 Keras 进行文本分类时，我也遇到了此错误。

The data format was: [some_text]\t[class_label]. My understanding was that there are some characters in the 1st column that somehow confuse the parser and the two columns cannot be split properly.

数据格式为：[some_text]\t[class_label]. 我的理解是第一列中有一些字符会以某种方式混淆解析器，并且两列无法正确拆分。

data = np.genfromtxt(my_file.csv, delimiter='\t', usecols=(0,1), dtype=str);

this snippet created the same ValueError with yours and my first workaround was to read everything as one column:

此代码段与您的代码创建了相同的 ValueError，我的第一个解决方法是将所有内容作为一列读取：

data = np.genfromtxt(my_file, delimiter='\t', usecols=(0), dtype=str);

and split the data later by myself.

并稍后由我自己拆分数据。

However, what finally worked properly was to explicitly define the comment parameter in genfromtxt.

然而，最终正常工作的是在 genfromtxt 中明确定义注释参数。

data = np.genfromtxt(my_file, delimiter='\t', usecols=(0,1), dtype=str, comments=None);

According to the documentation:

根据文档：

The optional argument comments is used to define a character string that marks the beginning of a comment. By default, genfromtxt assumes comments='#'. The comment marker may occur anywhere on the line. Any character present after the comment marker(s) is simply ignored.

可选参数comments 用于定义标记注释开始的字符串。默认情况下，genfromtxt 假定comments='#'。注释标记可能出现在该行的任何位置。注释标记之后出现的任何字符都将被忽略。

the default character that indicates a comment is '#', and thus if this character is included in your text column, everything is ignored after it. That is probably why the two columns cannot be recognized by genfromtext.

表示注释的默认字符是“#”，因此如果该字符包含在您的文本列中，则其后的所有内容都将被忽略。这可能就是 genfromtext 无法识别这两列的原因。

Answer 6

回答by zeeshan khan

An exception is raised if an inconsistency is detected in the number of columns.A number of reasons and solutions are possible.

如果检测到列数不一致，则会引发异常。可能有多种原因和解决方案。

Add invalid_raise = Falseto skip the offending lines.
dataset = genfromtxt(open('data.csv','r'), delimiter='', invalid_raise = False)
If your data contains Names, make sure that the field name doesn't contain any space or invalid character, or that it does not correspond to the name of a standard attribute (like size or shape), which would confuse the interpreter.

添加invalid_raise = False以跳过违规行。
dataset = genfromtxt(open('data.csv','r'), delimiter='', invalid_raise = False)
如果您的数据包含名称，请确保字段名称不包含任何空格或无效字符，或者它与标准属性的名称（如大小或形状）不对应，这会混淆解释器。

deletechars
Gives a string combining all the characters that must be deleted from the name. By default, invalid characters are ~!@#$%^&*()-=+~\|]}[{';: /?.>,<.
excludelist
Gives a list of the names to exclude, such as return, file, print…If one of the input name is part of this list, an underscore character ('_') will be appended to it.
case_sensitive
Whether the names should be case-sensitive (case_sensitive=True), converted to upper case (case_sensitive=Falseor case_sensitive='upper') or to lower case (case_sensitive='lower').

deletechars
给出一个字符串，该字符串组合了必须从名称中删除的所有字符。默认情况下，无效字符是 ~!@#$%^&*()-=+~\|]}[{';: /?.>,<.
excludelist
给出要排除的名称列表，例如return, file, print…如果输入名称之一是此列表的一部分，则将在其后附加下划线字符 ('_')。
case_sensitive
名称是否应区分大小写 ( case_sensitive=True)、转换为大写 (case_sensitive=False或case_sensitive='upper') 或小写 ( case_sensitive='lower')。

data = np.genfromtxt("data.txt", dtype=None, names=True,\
       deletechars="~!@#$%^&*()-=+~\|]}[{';: /?.>,<.", case_sensitive=True)

Reference: numpy.genfromtxt

参考：numpy.genfromtxt

Answer 7

回答by hemant c

In my case, the error aroused due to having a special symbolin the row.

就我而言，错误是由于行中有一个特殊符号而引起的。

Error cause: having special characters like

错误原因：有特殊字符，如

'#' hash
',' given the fact that your ( delimiter = ',' )

'#' 哈希
',' 鉴于您的 ( delimiter = ',' )

Example csv file

示例 csv 文件

1,hello,#this,fails
1,hello,',this',fails
-----CODE-----
import numpy as numpy data = numpy.genfromtxt(file, delimiter=delimeter) #Error

1，你好，#this，失败
1,你好, ',this',失败
- - -代码 - - -
import numpy as numpy data = numpy.genfromtxt(file, delimiter=delimeter) #Error

Environment Note:

环境注意事项：

OS: Ubuntu

操作系统： Ubuntu

csv editor: LibreOffice

csv 编辑器：LibreOffice

IDE: Pycharm

Python numpy中的“有1列而不是......”错误

提问by user3466132

回答by user545424

回答by atomh33ls

回答by Jonathon D

回答by tadf2

回答by KLaz

回答by zeeshan khan

回答by hemant c

-----CODE-----

- - -代码 - - -

相关推荐

最近更新

标签

Python numpy中的“有1列而不是......”错误

提问by user3466132

回答by user545424

回答by atomh33ls

回答by Jonathon D

回答by tadf2

回答by KLaz

回答by zeeshan khan

回答by hemant c

-----CODE-----

- - -代码 - - -

相关推荐

如何在 Vim 的 Syntastic 中将 Python 最大允许行长度设置为 120？

Python Flask request.args 与 request.form

类型错误：“函数”对象不可下标 Python

Python 即使模板文件存在，Flask 也会引发 TemplateNotFound 错误

相关推荐

最近更新

标签