pandas 将包含列表的列拆分为熊猫中的不同行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50729552/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Split Column containing lists into different rows in pandas
提问by Alexandra Espichán
I have a dataframe in pandas like this:
我在Pandas中有一个像这样的数据框:
id info
1 [1,2]
2 [3]
3 []
And I want to split it into different rows like this:
我想把它分成不同的行,如下所示:
id info
1 1
1 2
2 3
3 NaN
How can I do this?
我怎样才能做到这一点?
采纳答案by pgngp
You can try this out:
你可以试试这个:
>>> import pandas as pd
>>> df = pd.DataFrame({'id': [1,2,3], 'info': [[1,2],[3],[]]})
>>> s = df.apply(lambda x: pd.Series(x['info']), axis=1).stack().reset_index(level=1, drop=True)
>>> s.name = 'info'
>>> df2 = df.drop('info', axis=1).join(s)
>>> df2['info'] = pd.Series(df2['info'], dtype=object)
>>> df2
id info
0 1 1
0 1 2
1 2 3
2 3 NaN
Similar question is posted in here
类似的问题发布在这里
回答by An economist
This is rather convoluted way, which drops empty cells:
这是一种相当复杂的方式,它会丢弃空单元格:
import pandas as pd
df = pd.DataFrame({'id': [1,2,3],
'info': [[1,2], [3], [ ]]})
unstack_df = df.set_index(['id'])['info'].apply(pd.Series)\
.stack()\
.reset_index(level=1, drop=True)
unstack_df = unstack_df.reset_index()
unstack_df.columns = ['id', 'info']
unstack_df
>>
id info
0 1 1.0
1 1 2.0
2 2 3.0
回答by jpp
Here's one way using np.repeat
and itertools.chain
. Converting empty lists to {np.nan}
is a trick to fool Pandas into accepting an iterable as a value. This allows chain.from_iterable
to work error-free.
这是使用np.repeat
and的一种方法itertools.chain
。将空列表转换为{np.nan}
是一种欺骗 Pandas 接受可迭代值作为值的技巧。这允许chain.from_iterable
无错误地工作。
import numpy as np
from itertools import chain
df.loc[~df['info'].apply(bool), 'info'] = {np.nan}
res = pd.DataFrame({'id': np.repeat(df['id'], df['info'].map(len).values),
'info': list(chain.from_iterable(df['info']))})
print(res)
id info
0 1 1.0
0 1 2.0
1 2 3.0
2 3 NaN
回答by Patel
Try these methods too...
也试试这些方法...
Method 1
方法一
def split_dataframe_rows(df,column_selectors):
# we need to keep track of the ordering of the columns
def _split_list_to_rows(row,row_accumulator,column_selector):
split_rows = {}
max_split = 0
for column_selector in column_selectors:
split_row = row[column_selector]
split_rows[column_selector] = split_row
if len(split_row) > max_split:
max_split = len(split_row)
for i in range(max_split):
new_row = row.to_dict()
for column_selector in column_selectors:
try:
new_row[column_selector] = split_rows[column_selector].pop(0)
except IndexError:
new_row[column_selector] = ''
row_accumulator.append(new_row)
new_rows = []
df.apply(_split_list_to_rows,axis=1,args = (new_rows,column_selectors))
new_df = pd.DataFrame(new_rows, columns=df.columns)
return new_df
Method 2
方法二
def flatten_data(json = None):
df = pd.DataFrame(json)
list_cols = [col for col in df.columns if type(df.loc[0, col]) == list]
for i in range(len(list_cols)):
col = list_cols[i]
meta_cols = [col for col in df.columns if type(df.loc[0, col]) != list] + list_cols[i+1:]
json_data = df.to_dict('records')
df = json_normalize(data=json_data, record_path=col, meta=meta_cols, record_prefix=col+str('_'), sep='_')
return json_normalize(df.to_dict('records'))