pandas 将一列字符串转换为熊猫中的列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50278300/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:33:03  来源:igfitidea点击:

Convert a columns of string to list in pandas

pythonstringlistpandastuples

提问by Guido Muscioni

I have a problem with the type of one of my column in a pandas dataframe. Basically the column is saved in a csv file as a string, and I wanna use it as a tuple to be able to convert it in a list of numbers. Following there is a very simple csv:

我对 Pandas 数据框中的某一列的类型有疑问。基本上,该列作为字符串保存在 csv 文件中,我想将其用作元组,以便能够将其转换为数字列表。下面是一个非常简单的csv:

ID,LABELS
1,"(1.0,2.0,2.0,3.0,3.0,1.0,4.0)"
2,"(1.0,2.0,2.0,3.0,3.0,1.0,4.0)"

If a load it with the function "read_csv" I get a list of strings. I have tried to convert to a list, but I get the list version of a string:

如果使用函数“read_csv”加载它,我会得到一个字符串列表。我试图转换为列表,但我得到了字符串的列表版本:

df.LABELS.apply(lambda x: list(x))

returns:

返回:

['(','1','.','0',.,.,.,.,.,'4','.','0',')']

Any idea on how to be able to do it?

关于如何做到这一点的任何想法?

Thank you.

谢谢你。

回答by jezrael

Use str.stripand str.split:

使用str.stripstr.split

df['LABELS'] = df['LABELS'].str.strip('()').str.split(',')

But if no NaNs here, list comprehensionworking nice too:

但是,如果NaN这里没有,也可以list comprehension正常工作:

df['LABELS'] = [x.strip('()').split(',') for x in df['LABELS']]

回答by llllllllll

You can use ast.literal_eval, which will give you a tuple:

你可以使用ast.literal_eval,它会给你一个元组:

import ast
df.LABELS = df.LABELS.apply(ast.literal_eval)

If you do want a list, use:

如果您确实想要一个列表,请使用:

df.LABELS.apply(lambda s: list(ast.literal_eval(s)))

回答by sacuL

You can try this (assuming your csvis called filename.csv):

你可以试试这个(假设你csv被称为filename.csv):

df = pd.read_csv('filename.csv')

df['LABELS'] = df.LABELS.apply(lambda x: x.strip('()').split(','))

>>> df
   ID                               LABELS
0   1  [1.0, 2.0, 2.0, 3.0, 3.0, 1.0, 4.0]
1   2  [1.0, 2.0, 2.0, 3.0, 3.0, 1.0, 4.0]

回答by Yaakov Bressler

Alternatively, you might consider regular expressions:

或者,您可以考虑正则表达式:

pattern = re.compile("[0-9]\.[0-9]")
df.LABELS.apply(pattern.findall)