Pandas:将长度不等的列表列拆分为多列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44663903/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:50:35  来源:igfitidea点击:

Pandas: split column of lists of unequal length into multiple columns

pythonpandas

提问by user139014

I have a Pandas dataframe that looks like the below:

我有一个 Pandas 数据框,如下所示:

                   codes
1                  [71020]
2                  [77085]
3                  [36415]
4                  [99213, 99287]
5                  [99233, 99233, 99233]

I'm trying to split the lists in df['codes']into columns, like the below:

我正在尝试将列表df['codes']分成几列,如下所示:

                   code_1      code_2      code_3   
1                  71020
2                  77085
3                  36415
4                  99213       99287
5                  99233       99233       99233

where columns that don't have a value (because the list was not that long) are filled with blanks or NaNs or something.

其中没有值的列(因为列表没有那么长)用空格或 NaN 或其他东西填充。

I've seen answers like this oneand others similar to it, and while they work on lists of equal length, they all throw errors when I try to use the methods on lists of unequal length. Is there a good way do to this?

我见过像这样的答案和其他类似的答案,虽然它们处理等长的列表,但当我尝试在不等长的列表上使用这些方法时,它们都会抛出错误。有没有什么好办法呢?

回答by piRSquared

Try:

尝试:

pd.DataFrame(df.codes.values.tolist()).add_prefix('code_')

   code_0   code_1   code_2
0   71020      NaN      NaN
1   77085      NaN      NaN
2   36415      NaN      NaN
3   99213  99287.0      NaN
4   99233  99233.0  99233.0


Include the index

包括 index

pd.DataFrame(df.codes.values.tolist(), df.index).add_prefix('code_')

   code_0   code_1   code_2
1   71020      NaN      NaN
2   77085      NaN      NaN
3   36415      NaN      NaN
4   99213  99287.0      NaN
5   99233  99233.0  99233.0


We can nail down all the formatting with this:

我们可以用这个来确定所有的格式:

f = lambda x: 'code_{}'.format(x + 1)
pd.DataFrame(
    df.codes.values.tolist(),
    df.index, dtype=object
).fillna('').rename(columns=f)

   code_1 code_2 code_3
1   71020              
2   77085              
3   36415              
4   99213  99287       
5   99233  99233  99233

回答by MaxU

Another solution:

另一种解决方案:

In [95]: df.codes.apply(pd.Series).add_prefix('code_')
Out[95]:
    code_0   code_1   code_2
1  71020.0      NaN      NaN
2  77085.0      NaN      NaN
3  36415.0      NaN      NaN
4  99213.0  99287.0      NaN
5  99233.0  99233.0  99233.0