Python 如何在没有第一列的情况下读取 CSV
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19143667/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read a CSV without the first column
提问by pceccon
I am trying to read a simple CSV file like below, and put its contents in a 2D array:
我正在尝试读取如下所示的简单 CSV 文件,并将其内容放入二维数组中:
"","x","y","sim1","sim2","sim3","sim4","sim5","sim6","sim7","sim8","sim9","sim10","sim11","sim12"
"1",181180,333740,5.56588745117188,6.29487752914429,7.4835410118103,5.75873327255249,6.62183284759521,5.81478500366211,4.85671949386597,5.90418815612793,6.32611751556396,6.99649047851562,6.52076387405396,5.68944215774536
"2",181140,333700,6.36264753341675,6.5217604637146,6.16843748092651,5.55328798294067,7.00429201126099,6.43625402450562,6.17744159698486,6.72836923599243,6.38574266433716,6.81451606750488,6.68060827255249,6.14339065551758
"3",181180,333700,6.16541910171509,6.44704437255859,7.51744651794434,5.46270132064819,6.8890323638916,6.46842670440674,6.07698059082031,6.2140531539917,6.43774271011353,6.21923875808716,6.43355655670166,5.90692138671875
To do this, I use this:
为此,我使用这个:
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1)
But I always got this message:
但我总是收到这样的消息:
"ValueError: could not convert string to float: "1"
I thought the problem was with the first column of each row. So, I tried to read it without the first column, but I couldn't find out how.
我认为问题出在每一行的第一列。所以,我试图在没有第一列的情况下阅读它,但我不知道如何阅读。
So, how could I ignore the first column? Is there a way to read this file with the first column?
那么,我怎么能忽略第一列呢?有没有办法用第一列读取这个文件?
采纳答案by jmilloy
You can specify a converter for any column.
您可以为任何列指定转换器。
converters = {0: lambda s: float(s.strip('"')}
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, converters=converters)
Or, you can specify which columns to use, something like:
或者,您可以指定要使用的列,例如:
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, usecols=range(1,15))
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
One way you can skip the first column, without knowing the number of columns, is to read the number of columns from the csv manually. It's easy enough, although you may need to tweak this on occasion to account for formatting inconsistencies*.
在不知道列数的情况下跳过第一列的一种方法是手动从 csv 中读取列数。这很容易,尽管您有时可能需要调整它以解决格式不一致的问题*。
with open("Data/sim.csv") as f:
ncols = len(f.readline().split(','))
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, usecols=range(1,ncols+1))
*If there are blank lines at the top, you'll need to skip them. If there may be commas in the field headers, you should count columns using the first data line instead. So, if you have specific problems, I can add some details to make the code more robust.
*如果顶部有空行,您需要跳过它们。如果字段标题中可能有逗号,则应改为使用第一个数据行计算列数。因此,如果您有特定问题,我可以添加一些细节以使代码更健壮。
回答by Karthik
Trying reading csv file using csv library
尝试使用 csv 库读取 csv 文件
import csv
def someFunc(fname):
with open(fname) as f:
reader = csv.reader(f)
i = 0
header = True
for row in reader:
if header:
header = False
continue
out[i] = [row[j] for j in range(len(columns))]
i += 1
return out
out will have the 2D array.
out 将有二维数组。
回答by Deninhos
You could use pandas and read it as a DataFrame object. If you know the column that you do not want, just add a .drop
to the loading line:
您可以使用 pandas 并将其作为 DataFrame 对象读取。如果您知道不需要的列,只需.drop
在加载行中添加一个:
a = pandas.read_csv("Data/sim.csv",sep=",")
a = a.drop(a.columns[0], axis=1)
The first row will be read as a header, but you can add a skiprows=1 in the read_csv parameter. Pandas DataFrames are numpy arrays, so, converting columns or matrices to numpy arrays is pretty straightforward.
第一行将作为标题读取,但您可以在 read_csv 参数中添加 skiprows=1。Pandas DataFrame 是 numpy 数组,因此,将列或矩阵转换为 numpy 数组非常简单。
回答by oustella
jmilloy and Deninhos's answers are both good. If OP specifically wants to read in an NumPy array (as opposed to pandas dataframe), another simplistic alternative is to delete the index column after reading it in. This works when you know the index column is always the first, but number of features (columns) are flexible.
jmilloy 和 Deninhos 的回答都很好。如果 OP 特别想读入 NumPy 数组(而不是 Pandas 数据帧),另一个简单的替代方法是在读入后删除索引列。当您知道索引列始终是第一个但特征数量(列)是灵活的。
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1)
data = np.delete(data, 0, axis = 1)
回答by user12803044
with open(filename, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in csvreader:
data.append(float(row[1]))