在 Pandas 中读取包含列表的 csv

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20799593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:29:25  来源:igfitidea点击:

Reading csv containing a list in Pandas

pythoncsvpandas

提问by Finger twist

I'm trying to read this csv into pandas

我正在尝试将此 csv 读入Pandas

HK,"[u'5328.1', u'5329.3', '2013-12-27 13:58:57.973614']"
HK,"[u'5328.1', u'5329.3', '2013-12-27 13:58:59.237387']"
HK,"[u'5328.1', u'5329.3', '2013-12-27 13:59:00.346325']"

As you can see there are only 2 columns and the second one is a list, is there a way to interpret it correctly ( meaning reading the values in the list as columns) when using pd.read_csv()with arguments ?

正如您所看到的,只有 2 列,而第二列是一个列表,在使用带有参数的pd.read_csv()时,是否有一种方法可以正确解释它(意味着将列表中的值读取为列)?

thank you

谢谢你

回答by alko

One option is to use ast.literal_evalas converter:

一种选择是ast.literal_eval用作转换器:

>>> import ast
>>> df = pd.read_clipboard(header=None, quotechar='"', sep=',', 
...                   converters={1:ast.literal_eval})
>>> df
    0                                             1
0  HK  [5328.1, 5329.3, 2013-12-27 13:58:57.973614]
1  HK  [5328.1, 5329.3, 2013-12-27 13:58:59.237387]
2  HK  [5328.1, 5329.3, 2013-12-27 13:59:00.346325]

And convert those lists to a DataFrame if needed, for example with:

并在需要时将这些列表转换为 DataFrame,例如:

>>> df = pd.DataFrame.from_records(df[1].tolist(), index=df[0],
...                           columns=list('ABC')).reset_index()
>>> df['C'] = pd.to_datetime(df['C'])
>>> df
    0       A       B                          C
0  HK  5328.1  5329.3 2013-12-27 13:58:57.973614
1  HK  5328.1  5329.3 2013-12-27 13:58:59.237387
2  HK  5328.1  5329.3 2013-12-27 13:59:00.346325

回答by Superstar

Based alko's answer, you can use the df.apply() function for the first part to read the actual data in the list string:

基于 alko 的回答,您可以在第一部分使用 df.apply() 函数来读取列表字符串中的实际数据:

 >>> df = pd.read_clipboard(header=None,sep=',')
 >>> df
     0                                                  1
  0  HK  [u'5328.1', u'5329.3', '2013-12-27 13:58:57.97...
  1  HK  [u'5328.1', u'5329.3', '2013-12-27 13:58:59.23...
  2  HK  [u'5328.1', u'5329.3', '2013-12-27 13:59:00.34...
 >>> df[1] = df[1].apply(eval)
 >>> df
     0                                             1
  0  HK  [5328.1, 5329.3, 2013-12-27 13:58:57.973614]
  1  HK  [5328.1, 5329.3, 2013-12-27 13:58:59.237387]
  2  HK  [5328.1, 5329.3, 2013-12-27 13:59:00.346325]

回答by krishna keshav

use .strip() in python.

在 python 中使用 .strip()。

with open(csvfile, 'r')as infile:
    reader = csv.reader(infile)
    for row in reader:
        col1 = row[0]
        col2 = row[1:].strip("[]")