将 Pandas 单元格中的列表拆分为多列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40924332/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Splitting a list in a Pandas cell into multiple columns
提问by user2242044
I have a really simple Pandas
dataframe
where each cell contains a list. I'd like to split each element of the list into it's own column. I can do that by exporting the values and then creating a new dataframe
. This doesn't seem like a good way to do this especially, if my dataframe
had a column aside from the list column.
我有一个非常简单的方法Pandas
dataframe
,其中每个单元格都包含一个列表。我想将列表的每个元素拆分成它自己的列。我可以通过导出值然后创建一个新的dataframe
. 这似乎不是执行此操作的好方法,尤其是如果我dataframe
在列表列之外还有一列。
import pandas as pd
df = pd.DataFrame(data=[[[8,10,12]],
[[7,9,11]]])
df = pd.DataFrame(data=[x[0] for x in df.values])
Desired output:
期望的输出:
0 1 2
0 8 10 12
1 7 9 11
Follow-up based on @Psidom answer:
基于@Psidom 回答的后续行动:
If I did have a second column:
如果我确实有第二列:
df = pd.DataFrame(data=[[[8,10,12], 'A'],
[[7,9,11], 'B']])
How do I not loose the other column?
我如何不松开另一列?
Desired output:
期望的输出:
0 1 2 3
0 8 10 12 A
1 7 9 11 B
回答by Psidom
You can loop through the Series with apply()
function and convert each list to a Series
, this automatically expand the list as a series in the column direction:
您可以使用apply()
函数循环遍历系列并将每个列表转换为 a Series
,这会自动将列表扩展为列方向的系列:
df[0].apply(pd.Series)
# 0 1 2
#0 8 10 12
#1 7 9 11
Update: To keep other columns of the data frame, you can concatenate the result with the columns you want to keep:
更新:要保留数据框的其他列,您可以将结果与要保留的列连接起来:
pd.concat([df[0].apply(pd.Series), df[1]], axis = 1)
# 0 1 2 1
#0 8 10 12 A
#1 7 9 11 B
回答by Zero
You could do pd.DataFrame(df[col].values.tolist())
- is much faster ~500x
你可以做pd.DataFrame(df[col].values.tolist())
- 快得多~500x
In [820]: pd.DataFrame(df[0].values.tolist())
Out[820]:
0 1 2
0 8 10 12
1 7 9 11
In [821]: pd.concat([pd.DataFrame(df[0].values.tolist()), df[1]], axis=1)
Out[821]:
0 1 2 1
0 8 10 12 A
1 7 9 11 B
Timings
时间安排
Medium
中等的
In [828]: df.shape
Out[828]: (20000, 2)
In [829]: %timeit pd.DataFrame(df[0].values.tolist())
100 loops, best of 3: 15 ms per loop
In [830]: %timeit df[0].apply(pd.Series)
1 loop, best of 3: 4.06 s per loop
Large
大的
In [832]: df.shape
Out[832]: (200000, 2)
In [833]: %timeit pd.DataFrame(df[0].values.tolist())
10 loops, best of 3: 161 ms per loop
In [834]: %timeit df[0].apply(pd.Series)
1 loop, best of 3: 40.9 s per loop