将 Pandas 单元格中的列表拆分为多列

Question

提问by user2242044

I have a really simple Pandasdataframewhere each cell contains a list. I'd like to split each element of the list into it's own column. I can do that by exporting the values and then creating a new dataframe. This doesn't seem like a good way to do this especially, if my dataframehad a column aside from the list column.

我有一个非常简单的方法Pandasdataframe，其中每个单元格都包含一个列表。我想将列表的每个元素拆分成它自己的列。我可以通过导出值然后创建一个新的dataframe. 这似乎不是执行此操作的好方法，尤其是如果我dataframe在列表列之外还有一列。

import pandas as pd

df = pd.DataFrame(data=[[[8,10,12]],
                        [[7,9,11]]])

df = pd.DataFrame(data=[x[0] for x in df.values])

Desired output:

期望的输出：

   0   1   2
0  8  10  12
1  7   9  11

Follow-up based on @Psidom answer:

基于@Psidom 回答的后续行动：

If I did have a second column:

如果我确实有第二列：

df = pd.DataFrame(data=[[[8,10,12], 'A'],
                        [[7,9,11], 'B']])

How do I not loose the other column?

我如何不松开另一列？

Desired output:

期望的输出：

   0   1   2  3 
0  8  10  12  A
1  7   9  11  B

Answer 1

回答by Psidom

You can loop through the Series with apply()function and convert each list to a Series, this automatically expand the list as a series in the column direction:

您可以使用apply()函数循环遍历系列并将每个列表转换为 a Series，这会自动将列表扩展为列方向的系列：

df[0].apply(pd.Series)

#   0    1   2
#0  8   10  12
#1  7    9  11

Update: To keep other columns of the data frame, you can concatenate the result with the columns you want to keep:

更新：要保留数据框的其他列，您可以将结果与要保留的列连接起来：

pd.concat([df[0].apply(pd.Series), df[1]], axis = 1)

#   0    1   2  1
#0  8   10  12  A
#1  7    9  11  B

Answer 2

回答by Zero

You could do pd.DataFrame(df[col].values.tolist())- is much faster ~500x

你可以做pd.DataFrame(df[col].values.tolist())- 快得多~500x

In [820]: pd.DataFrame(df[0].values.tolist())
Out[820]:
   0   1   2
0  8  10  12
1  7   9  11

In [821]: pd.concat([pd.DataFrame(df[0].values.tolist()), df[1]], axis=1)
Out[821]:
   0   1   2  1
0  8  10  12  A
1  7   9  11  B

Timings

时间安排

Medium

中等的

In [828]: df.shape
Out[828]: (20000, 2)

In [829]: %timeit pd.DataFrame(df[0].values.tolist())
100 loops, best of 3: 15 ms per loop

In [830]: %timeit df[0].apply(pd.Series)
1 loop, best of 3: 4.06 s per loop

Large

大的

In [832]: df.shape
Out[832]: (200000, 2)

In [833]: %timeit pd.DataFrame(df[0].values.tolist())
10 loops, best of 3: 161 ms per loop

In [834]: %timeit df[0].apply(pd.Series)
1 loop, best of 3: 40.9 s per loop

将 Pandas 单元格中的列表拆分为多列

提问by user2242044

回答by Psidom

回答by Zero

相关推荐

最近更新

标签

将 Pandas 单元格中的列表拆分为多列

提问by user2242044

回答by Psidom

回答by Zero

相关推荐

pandas 熊猫滚动给出 NaN

pandas datareader 引发 AttributeError：模块“pandas.io”没有属性“data”

pandas 在熊猫数据框中用 NaN 替换空列表

如何在 Pandas/numpy 中将一系列数组转换为单个矩阵？

相关推荐

最近更新

标签