Python 以分号为分隔符读取 CSV 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44195394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:49:06  来源:igfitidea点击:

Read CSV file with semicolon as delimiter

pythonarraysnumpy

提问by Abhijeet Mohanty

I have a numpy2D array which is of the shape (4898, )where elements in each row are separated by a semi-colonbut are still stored in a single column and not multiple columns (the desired outcome). How do I create a split at each occurrence of a semi-colon in each array of the 2D array. I have written the following Python script to do so but it throws errors.

我有一个numpy二维数组,它的形状(4898, )是每行中的元素用分号分隔,但仍存储在单列而不是多列中(所需的结果)。如何在二维数组的每个数组中每次出现分号时创建拆分。我已经编写了以下 Python 脚本来执行此操作,但它会引发错误。

stochastic_gradient_descent_winequality.py

stochastic_gradient_descent_winequality.py

import numpy
import pandas

if __name__ == '__main__' :

    with open('winequality-white.csv', 'r') as f_0 :
        with open('winequality-white-updated.csv', 'w') as f_1 :
            f_0.next()
            for line in f_0 :
                f_1.write(line)


    wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
    wine_data_ = wine_data
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)

    print (numpy.shape(wine_data))

Errors

错误

Traceback (most recent call last):
  File "stochastic_gradient_descent_winequality.py", line 16, in <module>
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)
AttributeError: 'numpy.int64' object has no attribute 'split'

回答by Arya McCarthy

If you're using semicolons (;) as your csv-file separator instead of commas (,), you can adjust that first line:

如果您使用分号 ( ;) 作为 csv 文件分隔符而不是逗号 ( ,),则可以调整第一行:

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ';', header = None)


The problem with your list comprehension is that [x.split(';') for x in wine_data_]iterates over the column names.

您的列表理解的问题在于[x.split(';') for x in wine_data_]迭代列名称

That being the case, you have no need for the line with the list comprehension. You can read in your data and be done.

在这种情况下,您不需要使用列表理解的行。您可以读入您的数据并完成。

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
print (numpy.shape(wine_data))

回答by Kondiba

In this

在这

x.split(';') for x in wine_data_  

whatever xyou are getting that is not string. Only string have split(). If it is other than string then it will give this error

无论x你得到什么,它都不是字符串。只有字符串有split(). 如果它不是字符串,那么它会给出这个错误

object has no attribute 'split

对象没有属性 'split

Check your xvalue.

检查你的x价值。

回答by Tiny.D

Suppose your csv file is like this:

假设你的 csv 文件是这样的:

2.12;5.12;3.12
3.1233;4;2
4;4.9696;3
2;5.0344;3
3.59595;4;2
4;4;3.59595
...

Then change your code like this:

然后像这样改变你的代码:

import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ',', header = None)
wine_data_ = wine_data
wine_data = numpy.array([x.split(';') for x in wine_data_[0]], dtype = numpy.float)
wine_data

The wine_datawill be:

wine_data会是:

array([[ 2.12   ,  5.12   ,  3.12   ],
       [ 3.1233 ,  4.     ,  2.     ],
       [ 4.     ,  4.9696 ,  3.     ],
       [ 2.     ,  5.0344 ,  3.     ],
       [ 3.59595,  4.     ,  2.     ],
       [ 4.     ,  4.     ,  3.59595]])

Be more efficient:

提高效率:

import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ';', header = None)
wine_data = numpy.array(wine_data,dtype = numpy.float)