Pandas:将列中的列表拆分为多行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50217968/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: split list in column into multiple rows
提问by Mathias Lund
I have a question regarding splitting a list in a dataframe column into multiple rows.
我有一个关于将数据框列中的列表拆分为多行的问题。
Let's say I have this dataframe:
假设我有这个数据框:
Job position Job type id
0 [6] [1] 3
1 [2, 6] [3, 6, 5] 4
2 [1] [9] 43
I would like every single combination of numbers, so the final result would be:
我想要每一个数字组合,所以最终结果是:
id Job position Job type
0 3 6.0 1.0
1 4 2.0 3.0
2 4 2.0 6.0
3 4 2.0 5.0
4 4 6.0 3.0
5 4 6.0 6.0
6 4 6.0 5.0
7 43 1.0 9.0
Because right now I get this result:
因为现在我得到了这个结果:
id Job position Job type
0 3 6.0 1.0
1 4 2.0 3.0
2 4 6.0 6.0
3 4 NaN 5.0
4 43 1.0 9.0
In order to get the result above, I did:
为了得到上面的结果,我做了:
df = df.set_index(['id'])
(df.apply(lambda x: pd.DataFrame(x.tolist(),index=x.index)
.stack()
.rename(x.name)).reset_index())
回答by Ami Tavory
Similar to Scott Boston's suggestion, I suggest you explode the columns separately, then merge them together.
与 Scott Boston 的建议类似,我建议您单独分解列,然后将它们合并在一起。
For example, for 'Job position':
例如,对于“职位”:
>>> df['Job position'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')
value
index
0 6.0
1 2.0
2 1.0
1 6.0
And, all together:
而且,一起:
df = pd.DataFrame({'Job position': [[6], [2, 6], [1]], 'Job type': [[1], [3, 6, 5], [9]], 'id': [3, 4, 43]})
jobs = df['Job position'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')
types = df['Job type'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')
>>> pd.merge(
pd.merge(
jobs,
types,
left_index=True,
right_index=True),
df[['id']],
left_index=True,
right_index=True).rename(columns={'value_x': 'Job positions', 'value_y': 'Job type'})
Job positions Job type id
0 6.0 1.0 3
1 2.0 3.0 4
1 2.0 6.0 4
1 2.0 5.0 4
1 6.0 3.0 4
1 6.0 6.0 4
1 6.0 5.0 4
2 1.0 9.0 43
回答by piRSquared
Use a comprehension
使用理解
pd.DataFrame([
[p, t, i] for P, T, i in df.values
for p in P for t in T
], columns=df.columns)
Job position Job type id
0 6 1 3
1 2 3 4
2 2 6 4
3 2 5 4
4 6 3 4
5 6 6 4
6 6 5 4
7 1 9 43
Alternatives to iterating over values
迭代的替代方法 values
pd.DataFrame([
[p, t, i] for P, T, i in df.itertuples(index=False)
for p in P for t in T
], columns=df.columns)
z = zip(df['Job position'], df['Job type'], df['id'])
pd.DataFrame([
[p, t, i] for P, T, i in z
for p in P for t in T
], columns=df.columns)
To generalize this solution to accommodate any number of columns
推广此解决方案以容纳任意数量的列
pd.DataFrame([
[p, t] + a for P, T, *a in df.values
for p in P for t in T
], columns=df.columns)
Job position Job type id
0 6 1 3
1 2 3 4
2 2 6 4
3 2 5 4
4 6 3 4
5 6 6 4
6 6 5 4
7 1 9 43
回答by YOBEN_S
From data frame constructor
从数据框构造函数
s1=df.Jobposition.str.len()
s2=df.Jobtype.str.len()
pd.DataFrame({'id':df.id.repeat(s1*s2),
'Jobposition':np.concatenate([np.repeat(x,y) for x,y in zip(df.Jobposition,s2)]),
'Jobtype':np.concatenate(np.repeat(df.Jobtype,s1).values)})
Jobposition Jobtype id
0 6 1 3
1 2 3 4
1 2 6 4
1 2 5 4
1 6 3 4
1 6 6 4
1 6 5 4
2 1 9 43
回答by Ken T
import itertools
dfres = pd.DataFrame([j+(i[2],) for i in df.values for j in itertools.product(*i[0:2])]
,columns=df.columns)
Job position Job type id
0 6 1 3
1 2 3 4
2 2 6 4
3 2 5 4
4 6 3 4
5 6 6 4
6 6 5 4
7 1 9 43