Python Pandas DataFrame 将列表存储为字符串:如何转换回列表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23111990/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame stored list as string: How to convert back to list?
提问by Gyan Veda
I have an n-by-mPandas DataFrame df
defined as follows. (I know this is not the best way to do it. It makes sense for what I'm trying to do in my actual code, but that would be TMI for this post so just take my word that this approach works in my particular scenario.)
我有一个n× mPandas DataFramedf
定义如下。(我知道这不是最好的方法。这对我在实际代码中尝试做的事情有意义,但这将是这篇文章的 TMI,所以请相信我的话,这种方法适用于我的特定场景.)
>>> df = DataFrame(columns=['col1'])
>>> df.append(Series([None]), ignore_index=True)
>>> df
Empty DataFrame
Columns: [col1]
Index: []
I stored lists in the cells of this DataFrame as follows.
我将列表存储在这个 DataFrame 的单元格中,如下所示。
>>> df['column1'][0] = [1.23, 2.34]
>>> df
col1
0 [1, 2]
For some reason, the DataFrame stored this list as a string instead of a list.
出于某种原因,DataFrame 将此列表存储为字符串而不是列表。
>>> df['column1'][0]
'[1.23, 2.34]'
I have 2 questions for you.
我有2个问题要问你。
- Why does the DataFrame store a list as a string and is there a way around this behavior?
- If not, then is there a Pythonic way to convert this string into a list?
- 为什么 DataFrame 将列表存储为字符串,有没有办法解决这种行为?
- 如果没有,那么是否有一种 Pythonic 方法可以将此字符串转换为列表?
Update
更新
The DataFrame I was using had been saved and loaded from a CSV format. This format, rather than the DataFrame itself, converted the list from a string to a literal.
我使用的 DataFrame 已从 CSV 格式保存和加载。这种格式,而不是 DataFrame 本身,将列表从字符串转换为文字。
采纳答案by Alex Thornton
As you pointed out, this can commonly happen when saving and loading pandas DataFrames as .csv
files, which is a text format.
正如您所指出的,在将 Pandas DataFrames 保存和加载为.csv
文件(一种文本格式)时,通常会发生这种情况。
In your case this happened because list objects have a string representation, allowing them to be stored as .csv
files. Loading the .csv
will then yield that string representation.
在您的情况下,发生这种情况是因为列表对象具有字符串表示形式,允许将它们存储为.csv
文件。.csv
然后加载将产生该字符串表示。
If you want to store the actual objects, you should use DataFrame.to_pickle()
(note: objects must be picklable!).
如果你想存储实际的对象,你应该使用DataFrame.to_pickle()
(注意:对象必须是可腌制的!)。
To answer your second question, you can convert it back with ast.literal_eval
:
要回答您的第二个问题,您可以将其转换回ast.literal_eval
:
>>> from ast import literal_eval
>>> literal_eval('[1.23, 2.34]')
[1.23, 2.34]
回答by namit
for reference only... pandas don't convert lists into string. ..
仅供参考……熊猫不会将列表转换为字符串。..
In [29]: data2 = [{'a': [1, 5], 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
In [30]: df = pd.DataFrame(data2)
In [31]: df
Out[31]:
a b c
0 [1, 5] 2 NaN
1 5 10 20
In [32]: df['a'][0], type(df['a'][0])
Out[32]: ([1, 5], list)
In [33]: pd.__version__
Out[33]: '0.12.0'
回答by Rutger Hofste
I had the same problem. When storing a dataframe list column to a CSV file using df.to_csv(), list columns are converted to a string e.g. "[42, 42, 42]" instead of [42, 42, 42]
我有同样的问题。使用 df.to_csv() 将数据框列表列存储到 CSV 文件时,列表列将转换为字符串,例如“[42, 42, 42]”而不是 [42, 42, 42]
Alex answer is correct and you can use literal_eval
to convert the string back to a list. The problem with this approach is that you need to import an additional library and you need to apply or map the function to your dataframe. As easier way is to force Pandas to read the column as a Python object (dtype)
亚历克斯的答案是正确的,您可以使用literal_eval
将字符串转换回列表。这种方法的问题在于您需要导入一个额外的库,并且需要将该函数应用或映射到您的数据帧。更简单的方法是强制 Pandas 将列作为 Python 对象(dtype)读取
df["col1"].astype('O')
df["col1"].astype('O')
The O is used for Python objects including lists. More info here. Please note that this method fails if you parse empty list strings: "[]"
O 用于 Python 对象,包括列表。更多信息在这里。请注意,如果您解析空列表字符串,此方法将失败:“[]”
Alternatively you can also apply a function to your column (this one is for integers):
或者,您也可以将函数应用于您的列(此函数用于整数):
def stringToList(string):
# input format : "[42, 42, 42]" , note the spaces after the commas, in this case I have a list of integers
string = string[1:len(string)-1]
try:
if len(string) != 0:
tempList = string.split(", ")
newList = list(map(lambda x: int(x), tempList))
else:
newList = []
except:
newList = [-9999]
return(newList)
df["col1"] = df["col1"].apply(lambda x: stringToList(x))
回答by elPastor
I just came across this problem and there is a very simple solution (pandas.eval()). I'm using pandas 0.20.0.
我刚遇到这个问题,有一个非常简单的解决方案(pandas.eval())。我正在使用熊猫 0.20.0。
# SETUP
import pandas as pd
import io
csv = io.StringIO(u'''
id list
A1 [1,2]
A2 [3,4]
A3 [5,6]
''')
df = pd.read_csv(csv, delim_whitespace = True)
# TYPE CHECK <type 'str'>
print type(df.at[0, 'list'])
# MAIN CONVERSION
df['list'] = pd.eval(df['list'])
# TYPE CHECK <type 'list'>
print type(df.at[0, 'list'])
回答by Michael James Kali Galarnyk
1) There is a way around this behavior. Use loc helps here.
1)有一种方法可以解决这种行为。在这里使用 loc 有帮助。
>>> import pandas as pd
>>> df = pd.DataFrame(columns=['column1'])
>>> df = df.append(pd.Series(data = {'column1':[None]}), ignore_index = True)
column1
0 [None]
>>> # Add list to index 0 in column1
>>> df.loc[0,'column1'] = [1.23, 2.34]
>>> print(df.loc[0, 'column1'])
[1.23, 2.34]
2) Pythonic way to convert this string into a list. (This is probably what you want as the DataFrame you are using had been been saved and loaded from a CSV format, there are a couple solutions for this). This is an addition on pshep123's answer.
2) 将此字符串转换为列表的 Pythonic 方式。(这可能是您想要的,因为您使用的 DataFrame 已从 CSV 格式保存和加载,对此有几种解决方案)。这是对 pshep123 的回答的补充。
from ast import literal_eval
import pandas as pd
csv = io.StringIO(u'''
id list
A1 [1,2]
A2 [3,4]
A3 [5,6]
''')
df = pd.read_csv(csv, delim_whitespace = True)
# Output is a string
df.loc[0, 'list']
'[1,2]'
# Convert entire column to a list
df.loc[:,'list'] = df.loc[:,'list'].apply(lambda x: literal_eval(x))
# Output is a list
df.loc[0, 'list']
[1, 2]
回答by markroxor
You can directly use pandas -df = pd.read_csv(df_name, converters={'column_name': eval})
您可以直接使用熊猫 -df = pd.read_csv(df_name, converters={'column_name': eval})
This will read that column as a it's corresponding dtype in python instead of a string.
这将读取该列作为它在 python 中的相应 dtype 而不是字符串。
回答by Hassen Morad
A simple hack I used is to call a lambda function that indexes out the first and last elements (the list brackets in str form) and calls the split method followed by another that replaces the list elements with ints.
我使用的一个简单技巧是调用一个 lambda 函数,该函数对第一个和最后一个元素(str 形式的列表括号)进行索引,并调用 split 方法,然后调用另一个用整数替换列表元素的方法。
df['column1'] = df['column1'].apply(lambda x:x[1:-1].split(',')).apply(lambda x:[int(i) for i in x])
回答by John Doe
Adding onto Alex'sanswer. Here is another version which can be used for converting individual items from string to list
添加到亚历克斯的答案。这是另一个版本,可用于将单个项目从字符串转换为列表
import pandas as pd
from ast import literal_eval
df = pd.read_csv("some_csvfile.csv")
def item_gen(l):
for i in l:
yield(i)
for i in item_gen(df["some_column_with_list_item"]):
print(literal_eval(i))