Pandas：将列中的列表拆分为多行

Question

提问by Mathias Lund

I have a question regarding splitting a list in a dataframe column into multiple rows.

我有一个关于将数据框列中的列表拆分为多行的问题。

Let's say I have this dataframe:

假设我有这个数据框：

  Job position   Job type  id
0          [6]        [1]   3
1       [2, 6]  [3, 6, 5]   4
2          [1]        [9]  43

I would like every single combination of numbers, so the final result would be:

我想要每一个数字组合，所以最终结果是：

   id    Job position  Job type
0   3         6.0       1.0
1   4         2.0       3.0
2   4         2.0       6.0
3   4         2.0       5.0
4   4         6.0       3.0
5   4         6.0       6.0
6   4         6.0       5.0
7  43         1.0       9.0

Because right now I get this result:

因为现在我得到了这个结果：

   id    Job position  Job type
0   3         6.0       1.0
1   4         2.0       3.0
2   4         6.0       6.0
3   4         NaN       5.0
4  43         1.0       9.0

In order to get the result above, I did:

为了得到上面的结果，我做了：

df = df.set_index(['id'])
(df.apply(lambda x: pd.DataFrame(x.tolist(),index=x.index)
                        .stack()
                        .rename(x.name)).reset_index())

Answer 1

回答by Ami Tavory

Similar to Scott Boston's suggestion, I suggest you explode the columns separately, then merge them together.

与 Scott Boston 的建议类似，我建议您单独分解列，然后将它们合并在一起。

For example, for 'Job position':

例如，对于“职位”：

>>> df['Job position'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')
    value
index   
0   6.0
1   2.0
2   1.0
1   6.0

And, all together:

而且，一起：

df = pd.DataFrame({'Job position': [[6], [2, 6], [1]], 'Job type': [[1], [3, 6, 5], [9]], 'id': [3, 4, 43]})
jobs = df['Job position'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')
types = df['Job type'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')
>>> pd.merge(
    pd.merge(
        jobs,
        types,
        left_index=True,
        right_index=True),
    df[['id']],
    left_index=True,
    right_index=True).rename(columns={'value_x': 'Job positions', 'value_y': 'Job type'})
Job positions   Job type    id
0   6.0 1.0 3
1   2.0 3.0 4
1   2.0 6.0 4
1   2.0 5.0 4
1   6.0 3.0 4
1   6.0 6.0 4
1   6.0 5.0 4
2   1.0 9.0 43

Answer 2

回答by piRSquared

Use a comprehension

使用理解

pd.DataFrame([
    [p, t, i] for P, T, i in df.values
    for p in P for t in T
], columns=df.columns)

   Job position  Job type  id
0             6         1   3
1             2         3   4
2             2         6   4
3             2         5   4
4             6         3   4
5             6         6   4
6             6         5   4
7             1         9  43

Alternatives to iterating over values

迭代的替代方法 values

pd.DataFrame([
    [p, t, i] for P, T, i in df.itertuples(index=False)
    for p in P for t in T
], columns=df.columns)

z = zip(df['Job position'], df['Job type'], df['id'])
pd.DataFrame([
    [p, t, i] for P, T, i in z
    for p in P for t in T
], columns=df.columns)

To generalize this solution to accommodate any number of columns

推广此解决方案以容纳任意数量的列

pd.DataFrame([
    [p, t] + a for P, T, *a in df.values
    for p in P for t in T
], columns=df.columns)

   Job position  Job type  id
0             6         1   3
1             2         3   4
2             2         6   4
3             2         5   4
4             6         3   4
5             6         6   4
6             6         5   4
7             1         9  43

Answer 3

回答by YOBEN_S

From data frame constructor

从数据框构造函数

s1=df.Jobposition.str.len()

s2=df.Jobtype.str.len()
pd.DataFrame({'id':df.id.repeat(s1*s2),
  'Jobposition':np.concatenate([np.repeat(x,y) for x,y in zip(df.Jobposition,s2)]),
  'Jobtype':np.concatenate(np.repeat(df.Jobtype,s1).values)})

   Jobposition  Jobtype  id
0            6        1   3
1            2        3   4
1            2        6   4
1            2        5   4
1            6        3   4
1            6        6   4
1            6        5   4
2            1        9  43

Answer 4

回答by Ken T

import itertools
dfres = pd.DataFrame([j+(i[2],) for i in df.values for j in itertools.product(*i[0:2])]
        ,columns=df.columns)

   Job position  Job type  id
0             6         1   3
1             2         3   4
2             2         6   4
3             2         5   4
4             6         3   4
5             6         6   4
6             6         5   4
7             1         9  43

Pandas：将列中的列表拆分为多行

提问by Mathias Lund

回答by Ami Tavory

回答by piRSquared

回答by YOBEN_S

回答by Ken T

相关推荐

最近更新

标签

Pandas：将列中的列表拆分为多行

提问by Mathias Lund

回答by Ami Tavory

回答by piRSquared

回答by YOBEN_S

回答by Ken T

相关推荐

pandas 从具有不同长度的列表生成数据帧

pandas XML 到 CSV Python

pandas 熊猫读取带有部分通配符的csv文件

Pandas DataFrame to HTML：格式化值以显示居中

相关推荐

最近更新

标签