将包含列表的 Pandas 列“unstack”成多行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42012152/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:54:20  来源:igfitidea点击:

"unstack" a pandas column containing lists into multiple rows

pythonlistpandasdataframe

提问by Alex

Say I have the following Pandas Dataframe:

假设我有以下 Pandas 数据框:

df = pd.DataFrame({"a" : [1,2,3], "b" : [[1,2],[2,3,4],[5]]})
   a          b
0  1     [1, 2]
1  2  [2, 3, 4]
2  3        [5]

How would I "unstack" the lists in the "b" column in order to transform it into the dataframe:

我将如何“取消堆叠”“b”列中的列表以将其转换为数据帧:

   a  b
0  1  1
1  1  2
2  2  2
3  2  3
4  2  4
5  3  5

回答by MaxU

UPDATE:generic vectorized approach - will work also for multiple columns DFs:

更新:通用矢量化方法 - 也适用于多列 DF:

assuming we have the following DF:

假设我们有以下 DF:

In [159]: df
Out[159]:
   a          b  c
0  1     [1, 2]  5
1  2  [2, 3, 4]  6
2  3        [5]  7

Solution:

解决方案:

In [160]: lst_col = 'b'

In [161]: pd.DataFrame({
     ...:     col:np.repeat(df[col].values, df[lst_col].str.len())
     ...:     for col in df.columns.difference([lst_col])
     ...: }).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns.tolist()]
     ...:
Out[161]:
   a  b  c
0  1  1  5
1  1  2  5
2  2  2  6
3  2  3  6
4  2  4  6
5  3  5  7

Setup:

设置:

df = pd.DataFrame({
    "a" : [1,2,3],
    "b" : [[1,2],[2,3,4],[5]],
    "c" : [5,6,7]
})

Vectorized NumPy approach:

矢量化 NumPy 方法:

In [124]: pd.DataFrame({'a':np.repeat(df.a.values, df.b.str.len()),
                        'b':np.concatenate(df.b.values)})
Out[124]:
   a  b
0  1  1
1  1  2
2  2  2
3  2  3
4  2  4
5  3  5

OLD answer:

旧答案:

Try this:

尝试这个:

In [89]: df.set_index('a', append=True).b.apply(pd.Series).stack().reset_index(level=[0, 2], drop=True).reset_index()
Out[89]:
   a    0
0  1  1.0
1  1  2.0
2  2  2.0
3  2  3.0
4  2  4.0
5  3  5.0

Or bit nicer solution provided by @Boud:

或者@Boud 提供的更好的解决方案:

In [110]: df.set_index('a').b.apply(pd.Series).stack().reset_index(level=-1, drop=True).astype(int).reset_index()
Out[110]:
   a  0
0  1  1
1  1  2
2  2  2
3  2  3
4  2  4
5  3  5

回答by Karvy1

Here is another approach with itertuples-

这是另一种方法itertuples-

df = pd.DataFrame({"a" : [1,2,3], "b" : [[1,2],[2,3,4],[5]]})

data = []

for i in df.itertuples():
    lst = i[2]
    for col2 in lst:
        data.append([i[1], col2])

df_output = pd.DataFrame(data =data, columns=df.columns)
df_output 

Output is -

输出是 -

        a   b
    0   1   1
    1   1   2
    2   2   2
    3   2   3
    4   2   4
    5   3   5

Edit: You can also compress the loops into a single code and populate dataas -

编辑:您还可以将循环压缩为单个代码并填充data为 -

data = [[i[1], col2] for i in df.itertuples() for col2 in i[2]]