Python numpy中的“有1列而不是......”错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23353585/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:48:38  来源:igfitidea点击:

"Got 1 columns instead of ..." error in numpy

pythonnumpygenfromtxt

提问by user3466132

I'm working on the following code for performing Random Forest Classification on train and test sets;

我正在编写以下代码,用于在训练集和测试集上执行随机森林分类;

from sklearn.ensemble import RandomForestClassifier
from numpy import genfromtxt, savetxt

def main():
    dataset = genfromtxt(open('filepath','r'), delimiter=' ', dtype='f8')   
    target = [x[0] for x in dataset]
    train = [x[1:] for x in dataset]
    test = genfromtxt(open('filepath','r'), delimiter=' ', dtype='f8')

    rf = RandomForestClassifier(n_estimators=100)
    rf.fit(train, target)
    predicted_probs = [[index + 1, x[1]] for index, x in enumerate(rf.predict_proba(test))]

    savetxt('filepath', predicted_probs, delimiter=',', fmt='%d,%f', 
            header='Id,PredictedProbability', comments = '')

if __name__=="__main__":
    main()

However I get the following error on execution;

但是我在执行时遇到以下错误;

---->      dataset = genfromtxt(open('C:/Users/Saurabh/Desktop/pgm/Cora/a_train.csv','r'), delimiter='', dtype='f8')

ValueError: Some errors were detected !
    Line #88 (got 1435 columns instead of 1434)
    Line #93 (got 1435 columns instead of 1434)
    Line #164 (got 1435 columns instead of 1434)
    Line #169 (got 1435 columns instead of 1434)
    Line #524 (got 1435 columns instead of 1434)
...
...
...

Any suggestions as to how avoid it?? Thanks.

关于如何避免它的任何建议?谢谢。

回答by user545424

You have too many columns in one of your rows. For example

您的某一行中有太多列。例如

>>> import numpy as np
>>> from StringIO import StringIO
>>> s = """
... 1 2 3 4
... 1 2 3 4 5
... """
>>> np.genfromtxt(StringIO(s),delimiter=" ")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/site-packages/numpy/lib/npyio.py", line 1654, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #2 (got 5 columns instead of 4)

回答by atomh33ls

genfromtxtwill give this error if the number of columns is unequal.

genfromtxt如果列数不相等,则会出现此错误。

I can think of 3 ways around it:

我可以想到 3 种方法:

1. Use the usecolsparameter

1.使用usecols参数

np.genfromtxt('yourfile.txt',delimiter=',',usecols=np.arange(0,1434))

However - this may mean that you lose some data (where rows are longer than 1434 columns) - whether or not that matters is down to you.

但是 - 这可能意味着您会丢失一些数据(其中行长于 1434 列) - 是否重要取决于您。

2. Adjust your input data file so that it has an equal number of columns.

2. 调整您的输入数据文件,使其具有相同的列数。

3. Use something other than genfromtxt:

3. 使用其他东西 genfromtxt:

.............like this

……像这样

回答by Jonathon D

I had this error. The cause was a single entry in my data that had a space. This caused it to see it as an extra row. Make sure all spacing is consistent throughout all the data.

我有这个错误。原因是我的数据中的单个条目有一个空格。这导致它将其视为额外的一行。确保所有数据的所有间距都是一致的。

回答by tadf2

It seems like the header that includes the column names have 1 more column than the data itself (1435 columns on header vs. 1434 on data).

似乎包含列名称的标题比数据本身多 1 列(标题上的 1435 列与数据上的 1434 列)。

You could either:

你可以:

1) Eliminate 1 column from the header that doesn't make sense with data

1)从标题中删除对数据没有意义的1列

OR

或者

2) Use the skip header from genfromtxt() for example, np.genfromtxt('myfile', skip_header=*how many lines to skip*, delimiter=' ')more information found in the documentation.

2) 例如,使用 genfromtxt() 中的跳过标题np.genfromtxt('myfile', skip_header=*how many lines to skip*, delimiter=' '),在文档中可以找到更多信息。

回答by KLaz

I also had this error when I was also trying to load a text dataset with genfromtext and do text classification with Keras.

当我还尝试使用 genfromtext 加载文本数据集并使用 Keras 进行文本分类时,我也遇到了此错误。

The data format was: [some_text]\t[class_label]. My understanding was that there are some characters in the 1st column that somehow confuse the parser and the two columns cannot be split properly.

数据格式为:[some_text]\t[class_label]. 我的理解是第一列中有一些字符会以某种方式混淆解析器,并且两列无法正确拆分。

data = np.genfromtxt(my_file.csv, delimiter='\t', usecols=(0,1), dtype=str);

this snippet created the same ValueError with yours and my first workaround was to read everything as one column:

此代码段与您的代码创建了相同的 ValueError,我的第一个解决方法是将所有内容作为一列读取:

data = np.genfromtxt(my_file, delimiter='\t', usecols=(0), dtype=str);

and split the data later by myself.

并稍后由我自己拆分数据。

However, what finally worked properly was to explicitly define the comment parameter in genfromtxt.

然而,最终正常工作的是在 genfromtxt 中明确定义注释参数。

data = np.genfromtxt(my_file, delimiter='\t', usecols=(0,1), dtype=str, comments=None);

According to the documentation:

根据文档:

The optional argument comments is used to define a character string that marks the beginning of a comment. By default, genfromtxt assumes comments='#'. The comment marker may occur anywhere on the line. Any character present after the comment marker(s) is simply ignored.

可选参数comments 用于定义标记注释开始的字符串。默认情况下,genfromtxt 假定comments='#'。注释标记可能出现在该行的任何位置。注释标记之后出现的任何字符都将被忽略

the default character that indicates a comment is '#', and thus if this character is included in your text column, everything is ignored after it. That is probably why the two columns cannot be recognized by genfromtext.

表示注释的默认字符是“#”,因此如果该字符包含在您的文本列中,则其后的所有内容都将被忽略。这可能就是 genfromtext 无法识别这两列的原因。

回答by zeeshan khan

An exception is raised if an inconsistency is detected in the number of columns.A number of reasons and solutions are possible.

如果检测到列数不一致,则会引发异常。可能有多种原因和解决方案。

  1. Add invalid_raise = Falseto skip the offending lines.

    dataset = genfromtxt(open('data.csv','r'), delimiter='', invalid_raise = False)

  2. If your data contains Names, make sure that the field name doesn't contain any space or invalid character, or that it does not correspond to the name of a standard attribute (like size or shape), which would confuse the interpreter.

  1. 添加invalid_raise = False以跳过违规行。

    dataset = genfromtxt(open('data.csv','r'), delimiter='', invalid_raise = False)

  2. 如果您的数据包含名称,请确保字段名称不包含任何空格或无效字符,或者它与标准属性的名称(如大小或形状)不对应,这会混淆解释器。

  1. deletechars

    Gives a string combining all the characters that must be deleted from the name. By default, invalid characters are ~!@#$%^&*()-=+~\|]}[{';: /?.>,<.

  2. excludelist

    Gives a list of the names to exclude, such as return, file, print…If one of the input name is part of this list, an underscore character ('_') will be appended to it.

  3. case_sensitive

    Whether the names should be case-sensitive (case_sensitive=True), converted to upper case (case_sensitive=Falseor case_sensitive='upper') or to lower case (case_sensitive='lower').

  1. deletechars

    给出一个字符串,该字符串组合了必须从名称中删除的所有字符。默认情况下,无效字符是 ~!@#$%^&*()-=+~\|]}[{';: /?.>,<.

  2. excludelist

    给出要排除的名称列表,例如return, file, print…如果输入名称之一是此列表的一部分,则将在其后附加下划线字符 ('_')。

  3. case_sensitive

    名称是否应区分大小写 ( case_sensitive=True)、转换为大写 (case_sensitive=Falsecase_sensitive='upper') 或小写 ( case_sensitive='lower')。

data = np.genfromtxt("data.txt", dtype=None, names=True,\
       deletechars="~!@#$%^&*()-=+~\|]}[{';: /?.>,<.", case_sensitive=True)

Reference: numpy.genfromtxt

参考:numpy.genfromtxt

回答by hemant c

In my case, the error aroused due to having a special symbolin the row.

就我而言,错误是由于行中有一个特殊符号而引起的。

Error cause: having special characters like

错误原因:有特殊字符,如

  • '#' hash
  • ',' given the fact that your ( delimiter = ',' )
  • '#' 哈希
  • ',' 鉴于您的 ( delimiter = ',' )

Example csv file

示例 csv 文件

  • 1,hello,#this,fails
  • 1,hello,',this',fails

    -----CODE-----

    import numpy as numpy data = numpy.genfromtxt(file, delimiter=delimeter) #Error

  • 1,你好,#this,失败
  • 1,你好, ',this',失败

    - - -代码 - - -

    import numpy as numpy data = numpy.genfromtxt(file, delimiter=delimeter) #Error

Environment Note:

环境注意事项:

OS: Ubuntu

操作系统: Ubuntu

csv editor: LibreOffice

csv 编辑器:LibreOffice

IDE: Pycharm

IDE: Pycharm