pandas 将包含列表的列拆分为熊猫中的不同行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50729552/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:39:22  来源:igfitidea点击:

Split Column containing lists into different rows in pandas

pythonpandasdataframe

提问by Alexandra Espichán

I have a dataframe in pandas like this:

我在Pandas中有一个像这样的数据框:

id     info
1      [1,2]
2      [3]
3      []

And I want to split it into different rows like this:

我想把它分成不同的行,如下所示:

id     info
1      1 
1      2 
2      3 
3      NaN

How can I do this?

我怎样才能做到这一点?

采纳答案by pgngp

You can try this out:

你可以试试这个:

>>> import pandas as pd
>>> df = pd.DataFrame({'id': [1,2,3], 'info': [[1,2],[3],[]]})
>>> s = df.apply(lambda x: pd.Series(x['info']), axis=1).stack().reset_index(level=1, drop=True)
>>> s.name = 'info'
>>> df2 = df.drop('info', axis=1).join(s)
>>> df2['info'] = pd.Series(df2['info'], dtype=object)
>>> df2
   id info
0   1    1
0   1    2
1   2    3
2   3  NaN

Similar question is posted in here

类似的问题发布在这里

回答by An economist

This is rather convoluted way, which drops empty cells:

这是一种相当复杂的方式,它会丢弃空单元格:

import pandas as pd

df = pd.DataFrame({'id': [1,2,3],
                   'info': [[1,2], [3], [ ]]})

unstack_df = df.set_index(['id'])['info'].apply(pd.Series)\
                                         .stack()\
                                         .reset_index(level=1, drop=True)

unstack_df = unstack_df.reset_index()
unstack_df.columns = ['id', 'info']

unstack_df

>>
       id   info
    0   1   1.0
    1   1   2.0
    2   2   3.0

回答by jpp

Here's one way using np.repeatand itertools.chain. Converting empty lists to {np.nan}is a trick to fool Pandas into accepting an iterable as a value. This allows chain.from_iterableto work error-free.

这是使用np.repeatand的一种方法itertools.chain。将空列表转换为{np.nan}是一种欺骗 Pandas 接受可迭代值作为值的技巧。这允许chain.from_iterable无错误地工作。

import numpy as np
from itertools import chain

df.loc[~df['info'].apply(bool), 'info'] = {np.nan}

res = pd.DataFrame({'id': np.repeat(df['id'], df['info'].map(len).values),
                    'info': list(chain.from_iterable(df['info']))})

print(res)

   id  info
0   1   1.0
0   1   2.0
1   2   3.0
2   3   NaN

回答by Patel

Try these methods too...

也试试这些方法...

Method 1

方法一

def split_dataframe_rows(df,column_selectors):
    # we need to keep track of the ordering of the columns
    def _split_list_to_rows(row,row_accumulator,column_selector):
        split_rows = {}
        max_split = 0
        for column_selector in column_selectors:
            split_row = row[column_selector]
            split_rows[column_selector] = split_row
            if len(split_row) > max_split:
                max_split = len(split_row)

        for i in range(max_split):
            new_row = row.to_dict()
            for column_selector in column_selectors:
                try:
                    new_row[column_selector] = split_rows[column_selector].pop(0)
                except IndexError:
                    new_row[column_selector] = ''
            row_accumulator.append(new_row)

    new_rows = []
    df.apply(_split_list_to_rows,axis=1,args = (new_rows,column_selectors))
    new_df = pd.DataFrame(new_rows, columns=df.columns)
    return new_df

Method 2

方法二

def flatten_data(json = None):
    df = pd.DataFrame(json)
    list_cols = [col for col in df.columns if type(df.loc[0, col]) == list]
    for i in range(len(list_cols)):
        col = list_cols[i]
        meta_cols = [col for col in df.columns if type(df.loc[0, col]) != list] + list_cols[i+1:]
        json_data = df.to_dict('records')
        df = json_normalize(data=json_data, record_path=col, meta=meta_cols, record_prefix=col+str('_'), sep='_')
    return json_normalize(df.to_dict('records'))