Python 以分号为分隔符读取 CSV 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44195394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Read CSV file with semicolon as delimiter
提问by Abhijeet Mohanty
I have a numpy
2D array which is of the shape (4898, )
where elements in each row are separated by a semi-colonbut are still stored in a single column and not multiple columns (the desired outcome). How do I create a split at each occurrence of a semi-colon in each array of the 2D array. I have written the following Python script to do so but it throws errors.
我有一个numpy
二维数组,它的形状(4898, )
是每行中的元素用分号分隔,但仍存储在单列而不是多列中(所需的结果)。如何在二维数组的每个数组中每次出现分号时创建拆分。我已经编写了以下 Python 脚本来执行此操作,但它会引发错误。
stochastic_gradient_descent_winequality.py
stochastic_gradient_descent_winequality.py
import numpy
import pandas
if __name__ == '__main__' :
with open('winequality-white.csv', 'r') as f_0 :
with open('winequality-white-updated.csv', 'w') as f_1 :
f_0.next()
for line in f_0 :
f_1.write(line)
wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
wine_data_ = wine_data
wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)
print (numpy.shape(wine_data))
Errors
错误
Traceback (most recent call last):
File "stochastic_gradient_descent_winequality.py", line 16, in <module>
wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)
AttributeError: 'numpy.int64' object has no attribute 'split'
回答by Arya McCarthy
If you're using semicolons (;
) as your csv-file separator instead of commas (,
), you can adjust that first line:
如果您使用分号 ( ;
) 作为 csv 文件分隔符而不是逗号 ( ,
),则可以调整第一行:
wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ';', header = None)
The problem with your list comprehension is that [x.split(';') for x in wine_data_]
iterates over the column names.
您的列表理解的问题在于[x.split(';') for x in wine_data_]
迭代列名称。
That being the case, you have no need for the line with the list comprehension. You can read in your data and be done.
在这种情况下,您不需要使用列表理解的行。您可以读入您的数据并完成。
wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
print (numpy.shape(wine_data))
回答by Kondiba
In this
在这
x.split(';') for x in wine_data_
whatever x
you are getting that is not string. Only string have split()
. If it is other than string then it will give this error
无论x
你得到什么,它都不是字符串。只有字符串有split()
. 如果它不是字符串,那么它会给出这个错误
object has no attribute 'split
对象没有属性 'split
Check your x
value.
检查你的x
价值。
回答by Tiny.D
Suppose your csv file is like this:
假设你的 csv 文件是这样的:
2.12;5.12;3.12
3.1233;4;2
4;4.9696;3
2;5.0344;3
3.59595;4;2
4;4;3.59595
...
Then change your code like this:
然后像这样改变你的代码:
import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ',', header = None)
wine_data_ = wine_data
wine_data = numpy.array([x.split(';') for x in wine_data_[0]], dtype = numpy.float)
wine_data
The wine_data
will be:
该wine_data
会是:
array([[ 2.12 , 5.12 , 3.12 ],
[ 3.1233 , 4. , 2. ],
[ 4. , 4.9696 , 3. ],
[ 2. , 5.0344 , 3. ],
[ 3.59595, 4. , 2. ],
[ 4. , 4. , 3.59595]])
Be more efficient:
提高效率:
import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ';', header = None)
wine_data = numpy.array(wine_data,dtype = numpy.float)