pandas 如何使用pandas将一列csv读取为dtype列表？

Question

提问by nachiappanpl

I have a csv file with 3 columns, wherein each row of Column 3 has list of values in it. As you can see from the following table structure

我有一个包含 3 列的 csv 文件，其中第 3 列的每一行都有值列表。从下表结构可以看出

Col1,Col2,Col3
1,a1,"['Proj1', 'Proj2']"
2,a2,"['Proj3', 'Proj2']"
3,a3,"['Proj4', 'Proj1']"
4,a4,"['Proj3', 'Proj4']"
5,a5,"['Proj5', 'Proj2']"

Whenever I try to read this csv, Col3 is getting read as str object and not as list. I tried to alter the dtype of that column to list but got "Attribute Error" as below

每当我尝试读取此 csv 时，Col3 都会被读取为 str 对象而不是列表。我尝试将该列的 dtype 更改为列出，但出现“属性错误”，如下所示

df = pd.read_csv("inputfile.csv")
df.Col3.dtype = list

AttributeError                            Traceback (most recent call last)
<ipython-input-19-6f9ec76b1b30> in <module>()
----> 1 df.Col3.dtype = list

C:\Python27\lib\site-packages\pandas\core\generic.pyc in __setattr__(self,         name, value)
   1953                     object.__setattr__(self, name, value)
   1954             except (AttributeError, TypeError):
-> 1955                 object.__setattr__(self, name, value)
   1956 
   1957     #----------------------------------------------------------------------

AttributeError: can't set attribute

属性错误：无法设置属性

It would be really great if you can guide me how to go about it.

如果你能指导我如何去做，那就太好了。

Answer 1

回答by Padraic Cunningham

You could use the ast lib:

你可以使用 ast 库：

from ast import literal_eval


df.Col3 = df.Col3.apply(literal_eval)
print(df.Col3[0][0])
Proj1

You can also do it when you create the dataframe from the csv, using converters:

您也可以在从 csv 创建数据框时使用converters：

df = pd.read_csv("in.csv",converters={"Col3": literal_eval})

If you are sure the format is he same for all strings, stripping and splitting will be a lot faster:

如果您确定所有字符串的格式都相同，则剥离和拆分会快得多：

 df = pd.read_csv("in.csv",converters={"Col3": lambda x: x.strip("[]").split(", ")})

But you will end up with the strings wrapped in quotes

但是你最终会得到用引号包裹的字符串

Answer 2

回答by 5norre

Adding a replace to Cunninghams answer:

为 Cunninghams 答案添加替换：

df = pd.read_csv("in.csv",converters={"Col3": lambda x: x.strip("[]").replace("'","").split(", ")})

See also pandas - convert string into list of strings

另见pandas - 将字符串转换为字符串列表

Answer 3

回答by Ricardo

I have a different approach for this, which can be used for string representations of other data types, besides just lists.

我对此有一种不同的方法，除了列表之外，它还可用于其他数据类型的字符串表示。

You can use the json library and apply json.loads() to the desired column. e.g

您可以使用 json 库并将 json.loads() 应用于所需的列。例如

import json
df.my_column = df.my_column.apply(json.loads)

For this to work, however, your input strings must be enclosed in double quotations.

但是，要使其正常工作，您的输入字符串必须用双引号括起来。

Answer 4

回答by cs95

@Padraic Cunningham's answer will not work if you have to parse lists of strings that do not have quotes. For example, literal_evalwill successfully parse "['a', 'b', 'c']", but not "[a, b, c]". To load strings like this, use the PyYAMLlibrary.

如果您必须解析没有引号的字符串列表，@Padraic Cunningham 的答案将不起作用。例如，literal_eval将成功解析"['a', 'b', 'c']"，但不会成功解析"[a, b, c]"。要加载这样的字符串，请使用PyYAML库。

import io 
import pandas as pd

data = '''
A,B,C
"[1, 2, 3]",True,"[a, b, c]"
"[4, 5, 6]",False,"[d, e, f]"
'''

df = pd.read_csv(io.StringIO(data), sep=',')                                    
df
           A      B          C
0  [1, 2, 3]   True  [a, b, c]
1  [4, 5, 6]  False  [d, e, f]

df['C'].tolist()                                                           
# ['[a, b, c]', '[d, e, f]']

import yaml
df[['A', 'C']] = df[['A', 'C']].applymap(yaml.safe_load) 

df['C'].tolist()                                                           
# [['a', 'b', 'c'], ['d', 'e', 'f']]

yamlcan be installed using pip install pyyaml.

yaml可以使用pip install pyyaml.

Answer 5

回答by theletz

If you have the option to write the file -

如果您可以选择写入文件 -

you can use pd.to_parquetand pd.read_parquet(instead of csv).

您可以使用pd.to_parquet和pd.read_parquet（而不是 csv）。

It will properly parse this column.

它将正确解析此列。

pandas 如何使用pandas将一列csv读取为dtype列表？

提问by nachiappanpl

回答by Padraic Cunningham

回答by 5norre

回答by Ricardo

回答by cs95

回答by theletz

相关推荐

最近更新

标签

pandas 如何使用pandas将一列csv读取为dtype列表？

提问by nachiappanpl

回答by Padraic Cunningham

回答by 5norre

回答by Ricardo

回答by cs95

回答by theletz

相关推荐

pandas 在应用函数pandas python中包含组名

从 Pandas DataFrame 中的列创建一个元组

pandas AssertionError的解决方案：在Dataframes列表上连接操作时get_concat_dtype中的dtype判定无效

如何在长 Pandas 系列上应用三次样条插值？

相关推荐

最近更新

标签