如何使用带有完整标题的python导入csv文件,其中第一列是非数字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3428532/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to import a csv file using python with headers intact, where first column is a non-numerical
提问by myClone
This is an elaboration of a previous question, but as I delve deeper into python, I just get more confused as to how python handles csv files.
这是对上一个问题的详细说明,但是当我深入研究 python 时,我对 python 如何处理 csv 文件更加困惑。
I have a csv file, and it must stay that way (e.g., cannot convert it to text file). It is the equivalent of a 5 rows by 11 columns array or matrix, or vector.
我有一个 csv 文件,它必须保持这种状态(例如,无法将其转换为文本文件)。它相当于一个 5 行 x 11 列的数组或矩阵或向量。
I have been attempting to read in the csv using various methods I have found here and other places (e.g. python.org) so that it preserves the relationship between columns and rows, where the first row and the first column = non-numerical values. The rest are float values, and contain a mixture of positive and negative floats.
我一直在尝试使用我在这里和其他地方(例如python.org)找到的各种方法读取 csv,以便它保留列和行之间的关系,其中第一行和第一列 = 非数值。其余的是浮点值,包含正浮点数和负浮点数的混合。
What I wish to do is import the csv and compile it in python so that if I were to reference a column header, it would return its associated values stored in the rows. For example:
我想要做的是导入 csv 并在 python 中编译它,这样如果我要引用列标题,它将返回存储在行中的关联值。例如:
>>> workers, constant, age
>>> workers
w0
w1
w2
w3
constant
7.334
5.235
3.225
0
age
-1.406
-4.936
-1.478
0
And so forth...
等等……
I am looking for techniques for handling this kind of data structure. I am very new to python.
我正在寻找处理这种数据结构的技术。我对python很陌生。
采纳答案by John Machin
Python's csv module handles data row-wise, which is the usual way of looking at such data. You seem to want a column-wise approach. Here's one way of doing it.
Python 的 csv 模块按行处理数据,这是查看此类数据的常用方法。您似乎想要一种按列的方法。这是一种方法。
Assuming your file is named myclone.csvand contains
假设您的文件已命名myclone.csv并包含
workers,constant,age
w0,7.334,-1.406
w1,5.235,-4.936
w2,3.2225,-1.478
w3,0,0
this code should give you an idea or two:
这段代码应该给你一两个想法:
>>> import csv
>>> f = open('myclone.csv', 'rb')
>>> reader = csv.reader(f)
>>> headers = next(reader, None)
>>> headers
['workers', 'constant', 'age']
>>> column = {}
>>> for h in headers:
... column[h] = []
...
>>> column
{'workers': [], 'constant': [], 'age': []}
>>> for row in reader:
... for h, v in zip(headers, row):
... column[h].append(v)
...
>>> column
{'workers': ['w0', 'w1', 'w2', 'w3'], 'constant': ['7.334', '5.235', '3.2225', '0'], 'age': ['-1.406', '-4.936', '-1.478', '0']}
>>> column['workers']
['w0', 'w1', 'w2', 'w3']
>>> column['constant']
['7.334', '5.235', '3.2225', '0']
>>> column['age']
['-1.406', '-4.936', '-1.478', '0']
>>>
To get your numeric values into floats, add this
要将您的数值转换为浮点数,请添加以下内容
converters = [str.strip] + [float] * (len(headers) - 1)
up front, and do this
在前面,然后执行此操作
for h, v, conv in zip(headers, row, converters):
column[h].append(conv(v))
for each row instead of the similar two lines above.
对于每一行,而不是上面类似的两行。
回答by Katriel
For Python 3
对于 Python 3
Remove the rbargument and use either ror don't pass argument (default read mode).
删除rb参数并使用r或不传递参数 ( default read mode)。
with open( <path-to-file>, 'r' ) as theFile:
reader = csv.DictReader(theFile)
for line in reader:
# line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
# e.g. print( line[ 'workers' ] ) yields 'w0'
print(line)
For Python 2
对于 Python 2
import csv
with open( <path-to-file>, "rb" ) as theFile:
reader = csv.DictReader( theFile )
for line in reader:
# line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
# e.g. print( line[ 'workers' ] ) yields 'w0'
Python has a powerful built-in CSV handler. In fact, most things are already built in to the standard library.
Python 有一个强大的内置 CSV 处理程序。事实上,大多数东西已经内置到标准库中。
回答by Ankur
You can use pandas library and reference the rows and columns like this:
您可以使用 pandas 库并像这样引用行和列:
import pandas as pd
input = pd.read_csv("path_to_file");
#for accessing ith row:
input.iloc[i]
#for accessing column named X
input.X
#for accessing ith row and column named X
input.iloc[i].X

