Python 读取 CSV 的单列并存储在数组中

Question

提问by dh762

What is the best way to read from a csv, but only one specific column, like title?

从 csv 读取但只有一个特定列的最佳方法是什么title？

ID | date|  title |
-------------------
  1|  2013|   abc |
  2|  2012|   cde |

The column should then be stored in an array like this:

然后该列应存储在一个数组中，如下所示：

data = ["abc", "cde"]

This is what I have so far, with pandas:

这就是我到目前为止所拥有的，熊猫：

data = pd.read_csv("data.csv", index_col=2)

I've looked into this thread. I still get an IndexError: list index out of range.

我已经研究过这个线程。我仍然得到一个IndexError: list index out of range.

EDIT:

编辑：

It's not a table, it's comma seperated like this:

这不是一张桌子，它是用逗号分隔的，如下所示：

ID,date,title
1,2013,abc
2,2012,cde

Answer 1

采纳答案by Andy Hayden

One option is just to read in the entire csv, then select a column:

一种选择是读取整个 csv，然后选择一列：

data = pd.read_csv("data.csv")

data['title']  # as a Series
data['title'].values  # as a numpy array

As @dawg suggests, you can use the usecols argument, if you also use the squeeze argument to avoid some hackery flattening the values array...

正如@dawg 所建议的，您可以使用 usecols 参数，如果您还使用挤压参数来避免一些hackery 压扁值数组...

In [11]: titles = pd.read_csv("data.csv", sep=',', usecols=['title'], squeeze=True)

In [12]: titles  # Series
Out[12]: 
0    abc
1    cde
Name: title, dtype: object

In [13]: titles.values  # numpy array
Out[13]: array(['abc', 'cde'], dtype=object)

Answer 2

回答by dawg

You can do something like this:

你可以这样做：

>>> import pandas as pd
>>> from StringIO import StringIO
>>> txt='''\
... ID,date,title
... 1,2013,abc
... 2,2012,cde'''
>>> data=pd.read_csv(StringIO(txt), usecols=['title']).T.values.tolist()[0]
>>> data
['abc', 'cde']

Or, assuming that you have some blanks:

或者，假设您有一些空白：

txt='''\
ID,date,title
1,2013,abc
2,2012,cde
3,2014, 
4,2015,fgh'''
table=pd.read_csv(StringIO(txt), usecols=['title'])
print table
  title
0   abc
1   cde
2      
3   fgh
data=pd.read_csv(StringIO(txt), usecols=['title']).T.values.tolist()[0]
print data
['abc', 'cde', ' ', 'fgh']

Or if you have variable number of data fields:

或者，如果您有可变数量的数据字段：

txt='''\
ID,date,title
1,2013,
2,2012,cde
3
4,2015,fgh'''

print pd.read_csv(StringIO(txt), usecols=['title'])
  title
0   NaN
1   cde
2   NaN
3   fgh

print pd.read_csv(StringIO(txt), usecols=['title']).T.values.tolist()[0]
[nan, 'cde', nan, 'fgh']

Answer 3

回答by dh762

Finally, it was much simpler:

最后，它要简单得多：

import pandas as pd
data = pd.read_csv("mycsv.csv")
data.columns = ["ID", "date", "title"]
rawlist = list(data.title)

Python 读取 CSV 的单列并存储在数组中

提问by dh762

采纳答案by Andy Hayden

回答by dawg

回答by dh762

相关推荐

最近更新

标签

Python 读取 CSV 的单列并存储在数组中

提问by dh762

采纳答案by Andy Hayden

回答by dawg

回答by dh762

相关推荐

Python 无法将日期转换为 datetime64

使用 PyCharm 从 GitHub 安装 Python 包

Python SQLAlchemy 过滤器 in_ 运算符

Python 用零值替换 -inf

相关推荐

最近更新

标签