Pandas 从列中可用的列表数据中扩展行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39011511/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas expand rows from list data available in column
提问by Sanjay Yadav
I have a data frame like this in pandas:
我在Pandas中有一个这样的数据框:
column1 column2
[a,b,c] 1
[d,e,f] 2
[g,h,i] 3
Expected output:
预期输出:
column1 column2
a 1
b 1
c 1
d 2
e 2
f 2
g 3
h 3
i 3
How to process this data ?
如何处理这些数据?
回答by jezrael
You can create DataFrame
by its constructor and stack
:
您可以DataFrame
通过其构造函数创建和stack
:
df2 = pd.DataFrame(df.column1.tolist(), index=df.column2)
.stack()
.reset_index(level=1, drop=True)
.reset_index(name='column1')[['column1','column2']]
print (df2)
column1 column2
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2
6 g 3
7 h 3
8 i 3
If need change ordering by subset [['column1','column2']]
, you can also omit first reset_index
:
如果需要按子集更改排序[['column1','column2']]
,您也可以先省略reset_index
:
df2 = pd.DataFrame(df.column1.tolist(), index=df.column2)
.stack()
.reset_index(name='column1')[['column1','column2']]
print (df2)
column1 column2
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2
6 g 3
7 h 3
8 i 3
Another solution DataFrame.from_records
for creating DataFrame
from first column, then create Series
by stack
and join
to original DataFrame
:
另一种解决方案DataFrame.from_records
,用于创建DataFrame
从第一列,然后创建Series
通过stack
与join
原始DataFrame
:
df = pd.DataFrame({'column1': [['a','b','c'],['d','e','f'],['g','h','i']],
'column2':[1,2,3]})
a = pd.DataFrame.from_records(df.column1.tolist())
.stack()
.reset_index(level=1, drop=True)
.rename('column1')
print (a)
0 a
0 b
0 c
1 d
1 e
1 f
2 g
2 h
2 i
Name: column1, dtype: object
print (df.drop('column1', axis=1)
.join(a)
.reset_index(drop=True)[['column1','column2']])
column1 column2
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2
6 g 3
7 h 3
8 i 3
回答by Erfan
2019 updated answer
2019 updated answer
Since pandas >= 0.25.0
we have the explode
method for this, which expands list to a row for each element and repeats the rest of the columns:
因为pandas >= 0.25.0
我们有这个explode
方法,它将列表扩展为每个元素的一行并重复其余的列:
df.explode('column1').reset_index(drop=True)
Output
输出
column1 column2
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2
6 g 3
7 h 3
8 i 3
回答by bencekd
Another solution is to use the result_type='expand'
argument of the pandas.apply
function available since pandas 0.23. Answering @splinter's questionthis method can be generalized -- see below:
另一种解决方案是使用自 pandas 0.23 以来可用result_type='expand'
的pandas.apply
函数的参数。回答@splinter 的问题,这个方法可以推广——见下文:
import pandas as pd
from numpy import arange
df = pd.DataFrame(
{'column1' : [['a','b','c'],['d','e','f'],['g','h','i']],
'column2': [1,2,3]}
)
pd.melt(
df.join(
df.apply(lambda row: row['column1'], axis=1, result_type='expand')
),
value_vars=arange(df['column1'].shape[0]), value_name='column1', var_name='column2')[['column1','column2']]
# can be generalized
df = pd.DataFrame(
{'column1' : [['a','b','c'],['d','e','f'],['g','h','i']],
'column2': [1,2,3],
'column3': [[1,2],[2,3],[3,4]],
'column4': [42,23,321],
'column5': ['a','b','c']}
)
(pd.melt(
df.join(
df.apply(lambda row: row['column1'], axis=1, result_type='expand')
),
value_vars=arange(df['column1'].shape[0]), value_name='column1', id_vars=df.columns[1:])
.drop(columns=['variable'])[list(df.columns[:1]) + list(df.columns[1:])]
.sort_values(by=['column1']))
UPDATE(for Jwely's comment): if you have lists with varying length, you can do:
更新(对于 Jwely 的评论):如果您有不同长度的列表,您可以执行以下操作:
df = pd.DataFrame(
{'column1' : [['a','b','c'],['d','f'],['g','h','i']],
'column2': [1,2,3]}
)
longest = max(df['column1'].apply(lambda x: len(x)))
pd.melt(
df.join(
df.apply(lambda row: row['column1'] if len(row['column1']) >= longest else row['column1'] + [None] * (longest - len(row['column1'])), axis=1, result_type='expand')
),
value_vars=arange(df['column1'].shape[0]), value_name='column1', var_name='column2').query("column1 == column1")[['column1','column2']]