pandas 包含数组的熊猫系列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35722187/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:47:45  来源:igfitidea点击:

pandas series containing arrays

pythonpandas

提问by toast

I have a pandas dataframe column which looks a little like:

我有一个看起来有点像的Pandas数据框列:

Out[67]:
0      ["cheese", "milk...
1      ["yogurt", "cheese...
2      ["cheese", "cream"...
3      ["milk", "cheese"...

now, ultimately I would like this as a flat list, but in attempting to flatten this, i noticed that pandas treats ["cheese", "milk", "cream"]as strrather than list

现在,最终我希望将其作为一个平面列表,但是在试图将其展平时,我注意到大Pandas将其["cheese", "milk", "cream"]视为str而不是list

How would i go about flattening this so I end up with:

我将如何将其展平,因此我最终得到:

["cheese", "milk", "yogurt", "cheese", "cheese"...]

[EDIT] So the answer given below appears to be:

[编辑] 所以下面给出的答案似乎是:

s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])

s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])

s = s.str.strip("[]")
df = s.str.split(',', expand=True)
df = df.applymap(lambda x: x.replace("'", '').strip())
l = df.values.flatten()
print (l.tolist())

Which is great, question answered, answer accepted but it strikes me as rather inelegant solution.

这很好,问题得到回答,答案被接受,但在我看来,这是相当不雅的解决方案。

采纳答案by jezrael

You can use numpy.flattenand then flat nested lists- see:

您可以使用numpy.flatten然后平面嵌套lists-请参阅

print df
                  a
0    [cheese, milk]
1  [yogurt, cheese]
2   [cheese, cream]

print df.a.values
[[['cheese', 'milk']]
 [['yogurt', 'cheese']]
 [['cheese', 'cream']]]

l = df.a.values.flatten()
print l
[['cheese', 'milk'] ['yogurt', 'cheese'] ['cheese', 'cream']]

print [item for sublist in l for item in sublist]
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']

EDIT:

编辑:

You can try:

你可以试试:

import pandas as pd

s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])

#remove []
s = s.str.strip('[]')
print s
0      'cheese', 'milk'
1    'yogurt', 'cheese'
2     'cheese', 'cream'
dtype: object

df = s.str.split(',', expand=True)
#remove ' and strip empty string
df = df.applymap(lambda x: x.replace("'", '').strip())
print df
        0       1
0  cheese    milk
1  yogurt  cheese
2  cheese   cream

l = df.values.flatten()
print l.tolist()
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']

回答by Colin

You can convert the Seriesinto a DataFrameand then call stack:

您可以将 转换Series为 aDataFrame然后调用stack

s.apply(pd.Series).stack().tolist()

回答by Colin

To convert the column values from str to list you could use df.columnName.tolist()and for flattening you could do df.columnName.values.flatten()

要将列值从 str 转换为列表,您可以使用df.columnName.tolist()并展平您可以执行的操作df.columnName.values.flatten()