Python 读取 CSV 的单列并存储在数组中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21065938/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Read a single column of a CSV and store in an array
提问by dh762
What is the best way to read from a csv, but only one specific column, like title?
从 csv 读取但只有一个特定列的最佳方法是什么title?
ID | date| title |
-------------------
1| 2013| abc |
2| 2012| cde |
The column should then be stored in an array like this:
然后该列应存储在一个数组中,如下所示:
data = ["abc", "cde"]
This is what I have so far, with pandas:
这就是我到目前为止所拥有的,熊猫:
data = pd.read_csv("data.csv", index_col=2)
I've looked into this thread. I still get an IndexError: list index out of range.
我已经研究过这个线程。我仍然得到一个IndexError: list index out of range.
EDIT:
编辑:
It's not a table, it's comma seperated like this:
这不是一张桌子,它是用逗号分隔的,如下所示:
ID,date,title
1,2013,abc
2,2012,cde
采纳答案by Andy Hayden
One option is just to read in the entire csv, then select a column:
一种选择是读取整个 csv,然后选择一列:
data = pd.read_csv("data.csv")
data['title'] # as a Series
data['title'].values # as a numpy array
As @dawg suggests, you can use the usecols argument, if you also use the squeeze argument to avoid some hackery flattening the values array...
正如@dawg 所建议的,您可以使用 usecols 参数,如果您还使用挤压参数来避免一些hackery 压扁值数组...
In [11]: titles = pd.read_csv("data.csv", sep=',', usecols=['title'], squeeze=True)
In [12]: titles # Series
Out[12]:
0 abc
1 cde
Name: title, dtype: object
In [13]: titles.values # numpy array
Out[13]: array(['abc', 'cde'], dtype=object)
回答by dawg
You can do something like this:
你可以这样做:
>>> import pandas as pd
>>> from StringIO import StringIO
>>> txt='''\
... ID,date,title
... 1,2013,abc
... 2,2012,cde'''
>>> data=pd.read_csv(StringIO(txt), usecols=['title']).T.values.tolist()[0]
>>> data
['abc', 'cde']
Or, assuming that you have some blanks:
或者,假设您有一些空白:
txt='''\
ID,date,title
1,2013,abc
2,2012,cde
3,2014,
4,2015,fgh'''
table=pd.read_csv(StringIO(txt), usecols=['title'])
print table
title
0 abc
1 cde
2
3 fgh
data=pd.read_csv(StringIO(txt), usecols=['title']).T.values.tolist()[0]
print data
['abc', 'cde', ' ', 'fgh']
Or if you have variable number of data fields:
或者,如果您有可变数量的数据字段:
txt='''\
ID,date,title
1,2013,
2,2012,cde
3
4,2015,fgh'''
print pd.read_csv(StringIO(txt), usecols=['title'])
title
0 NaN
1 cde
2 NaN
3 fgh
print pd.read_csv(StringIO(txt), usecols=['title']).T.values.tolist()[0]
[nan, 'cde', nan, 'fgh']
回答by dh762
Finally, it was much simpler:
最后,它要简单得多:
import pandas as pd
data = pd.read_csv("mycsv.csv")
data.columns = ["ID", "date", "title"]
rawlist = list(data.title)

