pandas python - 如何根据多列的值将多行合并为一行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51901068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to combine multiple rows into a single row with python pandas based on the values of multiple columns?
提问by Steward
I need to combine multiple rows into a single row, and the original dataframes looks like:
我需要将多行合并为一行,原始数据帧如下所示:
IndividualID DayID TripID JourSequence TripPurpose
200100000001 1 1 1 3
200100000001 1 2 2 31
200100000001 1 3 3 23
200100000001 1 4 4 5
200100000009 1 55 1 3
200100000009 1 56 2 12
200100000009 1 57 3 4
200100000009 1 58 4 6
200100000009 1 59 5 19
200100000009 1 60 6 2
I was trying to build some sort of 'trip chain', so basically all the journey sequences and trip purposes of one individual on a single day should be in the same row...
我试图建立某种“旅行链”,所以基本上一个人一天的所有旅行顺序和旅行目的都应该在同一行......
Ideally I was trying to convert the table to something like this:
理想情况下,我试图将表格转换为这样的:
IndividualID DayID Seq1 TripPurp1 Seq2 TripPur2 Seq3 TripPurp3 Seq4 TripPur4
200100000001 1 1 3 2 31 3 23 4 5
200100000009 1 1 3 2 12 3 4 4 6
If this is not possible, then the following mode would also be fine:
如果这是不可能的,那么以下模式也可以:
IndividualID DayID TripPurposes
200100000001 1 3, 31, 23, 5
200100000009 1 3, 12, 4, 6
Is there any possible solutions? I was thinking on for loop/ while statement, but maybe that was not really a good idea. Thanks in advance!
有没有可能的解决方案?我在考虑 for 循环/while 语句,但也许这不是一个好主意。提前致谢!
采纳答案by Scott Boston
You can try:
你可以试试:
df_out = df.set_index(['IndividualID','DayID',df.groupby(['IndividualID','DayID']).cumcount()+1]).unstack().sort_index(level=1, axis=1)
df_out.columns = df_out.columns.map('{0[0]}_{0[1]}'.format)
df_out.reset_index()
Output:
输出:
IndividualID DayID JourSequence_1 TripID_1 TripPurpose_1 \
0 200100000001 1 1.0 1.0 3.0
1 200100000009 1 1.0 55.0 3.0
JourSequence_2 TripID_2 TripPurpose_2 JourSequence_3 TripID_3 \
0 2.0 2.0 31.0 3.0 3.0
1 2.0 56.0 12.0 3.0 57.0
TripPurpose_3 JourSequence_4 TripID_4 TripPurpose_4 JourSequence_5 \
0 23.0 4.0 4.0 5.0 NaN
1 4.0 4.0 58.0 6.0 5.0
TripID_5 TripPurpose_5 JourSequence_6 TripID_6 TripPurpose_6
0 NaN NaN NaN NaN NaN
1 59.0 19.0 6.0 60.0 2.0
回答by Yo_Chris
To get your second output you just need to groupby and apply list:
要获得第二个输出,您只需要分组并应用列表:
df.groupby(['IndividualID', 'DayID'])['TripPurpose'].apply(list)
TripPurpose
IndividualID DayID
200100000001 1 [3, 31, 23, 5]
200100000009 1 [3, 12, 4, 6, 19, 2]
to get your first output you can do something like this (probably not the best approach):
要获得第一个输出,您可以执行以下操作(可能不是最佳方法):
df2 = pd.DataFrame(df.groupby(['IndividualID', 'DayID'])['TripPurpose'].apply(list))
trip = df2['TripPurpose'].apply(pd.Series).rename(columns = lambda x: 'TripPurpose'+ str(x+1))
df3 = pd.DataFrame(df.groupby(['IndividualID', 'DayID'])['JourSequence'].apply(list))
seq = df3['JourSequence'].apply(pd.Series).rename(columns = lambda x: 'seq'+ str(x+1))
pd.merge(trip,seq,on=['IndividualID','DayID'])
output is not sorted
输出未排序