Python 如何在没有第一列的情况下读取 CSV

Question

提问by pceccon

I am trying to read a simple CSV file like below, and put its contents in a 2D array:

我正在尝试读取如下所示的简单 CSV 文件，并将其内容放入二维数组中：

"","x","y","sim1","sim2","sim3","sim4","sim5","sim6","sim7","sim8","sim9","sim10","sim11","sim12"
"1",181180,333740,5.56588745117188,6.29487752914429,7.4835410118103,5.75873327255249,6.62183284759521,5.81478500366211,4.85671949386597,5.90418815612793,6.32611751556396,6.99649047851562,6.52076387405396,5.68944215774536
"2",181140,333700,6.36264753341675,6.5217604637146,6.16843748092651,5.55328798294067,7.00429201126099,6.43625402450562,6.17744159698486,6.72836923599243,6.38574266433716,6.81451606750488,6.68060827255249,6.14339065551758
"3",181180,333700,6.16541910171509,6.44704437255859,7.51744651794434,5.46270132064819,6.8890323638916,6.46842670440674,6.07698059082031,6.2140531539917,6.43774271011353,6.21923875808716,6.43355655670166,5.90692138671875

To do this, I use this:

为此，我使用这个：

data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1)

But I always got this message:

但我总是收到这样的消息：

"ValueError: could not convert string to float: "1"

I thought the problem was with the first column of each row. So, I tried to read it without the first column, but I couldn't find out how.

我认为问题出在每一行的第一列。所以，我试图在没有第一列的情况下阅读它，但我不知道如何阅读。

So, how could I ignore the first column? Is there a way to read this file with the first column?

那么，我怎么能忽略第一列呢？有没有办法用第一列读取这个文件？

Answer 1

采纳答案by jmilloy

You can specify a converter for any column.

您可以为任何列指定转换器。

converters = {0: lambda s: float(s.strip('"')}
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, converters=converters)

Or, you can specify which columns to use, something like:

或者，您可以指定要使用的列，例如：

data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, usecols=range(1,15))

http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html

One way you can skip the first column, without knowing the number of columns, is to read the number of columns from the csv manually. It's easy enough, although you may need to tweak this on occasion to account for formatting inconsistencies*.

在不知道列数的情况下跳过第一列的一种方法是手动从 csv 中读取列数。这很容易，尽管您有时可能需要调整它以解决格式不一致的问题*。

with open("Data/sim.csv") as f:
    ncols = len(f.readline().split(','))

data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, usecols=range(1,ncols+1))

*If there are blank lines at the top, you'll need to skip them. If there may be commas in the field headers, you should count columns using the first data line instead. So, if you have specific problems, I can add some details to make the code more robust.

*如果顶部有空行，您需要跳过它们。如果字段标题中可能有逗号，则应改为使用第一个数据行计算列数。因此，如果您有特定问题，我可以添加一些细节以使代码更健壮。

Answer 2

回答by Karthik

Trying reading csv file using csv library

尝试使用 csv 库读取 csv 文件

import csv

def someFunc(fname):
    with open(fname) as f:
    reader = csv.reader(f)

    i = 0
    header = True
    for row in reader:
        if header:
            header = False
            continue

        out[i] = [row[j] for j in range(len(columns))]
        i += 1
return out

out will have the 2D array.

out 将有二维数组。

Answer 3

回答by Deninhos

You could use pandas and read it as a DataFrame object. If you know the column that you do not want, just add a .dropto the loading line:

您可以使用 pandas 并将其作为 DataFrame 对象读取。如果您知道不需要的列，只需.drop在加载行中添加一个：

a = pandas.read_csv("Data/sim.csv",sep=",")
a = a.drop(a.columns[0], axis=1)

The first row will be read as a header, but you can add a skiprows=1 in the read_csv parameter. Pandas DataFrames are numpy arrays, so, converting columns or matrices to numpy arrays is pretty straightforward.

第一行将作为标题读取，但您可以在 read_csv 参数中添加 skiprows=1。Pandas DataFrame 是 numpy 数组，因此，将列或矩阵转换为 numpy 数组非常简单。

Answer 4

回答by oustella

jmilloy and Deninhos's answers are both good. If OP specifically wants to read in an NumPy array (as opposed to pandas dataframe), another simplistic alternative is to delete the index column after reading it in. This works when you know the index column is always the first, but number of features (columns) are flexible.

jmilloy 和 Deninhos 的回答都很好。如果 OP 特别想读入 NumPy 数组（而不是 Pandas 数据帧），另一个简单的替代方法是在读入后删除索引列。当您知道索引列始终是第一个但特征数量（列）是灵活的。

data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1)
data = np.delete(data, 0, axis = 1)

Answer 5

回答by user12803044

with open(filename, 'r') as csvfile:
    csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
    for row in csvreader:
        data.append(float(row[1]))

Python 如何在没有第一列的情况下读取 CSV

提问by pceccon

采纳答案by jmilloy

回答by Karthik

回答by Deninhos

回答by oustella

回答by user12803044

相关推荐

最近更新

标签

Python 如何在没有第一列的情况下读取 CSV

提问by pceccon

采纳答案by jmilloy

回答by Karthik

回答by Deninhos

回答by oustella

回答by user12803044

相关推荐

Python Django：ImproperlyConfigured：SECRET_KEY 设置不能为空

Python OpenCV3 中的 cv2.cv 替换是什么？

Python 如何在 SQLAlchemy 中定义复合主键

Python Seaborn：带有频率的 countplot()

相关推荐

最近更新

标签