Pandas groupby 多列,多列列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51584363/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas groupby multiple columns, list of multiple columns
提问by GrandmasLove
I have the following data:
我有以下数据:
Invoice NoStockCode Description Quantity CustomerID Country
536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 17850 United Kingdom
536365 71053 WHITE METAL LANTERN 6 17850 United Kingdom
536365 84406B CREAM CUPID HEARTS COAT HANGER 8 17850 United Kingdom
I am trying to do a groupby so i have the following operation:
我正在尝试进行 groupby,因此我有以下操作:
df.groupby(['InvoiceNo','CustomerID','Country'])['NoStockCode','Description','Quantity'].apply(list)
I want to get the output
我想得到输出
|Invoice |CustomerID |Country |NoStockCode |Description |Quantity
|536365| |17850 |United Kingdom |85123A, 71053, 84406B |WHITE HANGING HEART T-LIGHT HOLDER, WHITE METAL LANTERN, CREAM CUPID HEARTS COAT HANGER |6, 6, 8
Instead I get:
相反,我得到:
|Invoice |CustomerID |Country |0
|536365| |17850 |United Kingdom |['NoStockCode','Description','Quantity']
I have tried agg and other methods, but I haven't been able to get all of the columns to join as a list. I don't need to use the list function, but in the end I want the different columns to be lists.
我尝试过 agg 和其他方法,但我无法将所有列作为列表加入。我不需要使用 list 函数,但最后我希望不同的列成为列表。
回答by Ben.T
I can't reproduce your code right now, but I think that:
我现在无法重现您的代码,但我认为:
print (df.groupby(['InvoiceNo','CustomerID','Country'],
as_index=False)['NoStockCode','Description','Quantity']
.agg(lambda x: list(x)))
would give you the expected output
会给你预期的输出
回答by YOBEN_S
IIUC
国际大学联盟
df.groupby(['Invoice','CustomerID'],as_index=False)['Description','NoStockCode'].agg(','.join)
Out[47]:
Invoice CustomerID Description \
0 536365 17850 WHITEHANGINGHEARTT-LIGHTHOLDER,WHITEMETALANTER...
NoStockCode
0 85123A,71053,84406B
回答by Syed
Try using a variation of the following:
尝试使用以下变体:
df.groupby('company').product.agg([('count', 'count'), ('NoStockCode', ', '.join), ('Descrption', ', '.join), ('Quantity', ', '.join)])
回答by unutbu
You could use pd.pivot_table
with aggfunc=list
:
你可以使用pd.pivot_table
同aggfunc=list
:
import pandas as pd
df = pd.DataFrame({'Country': ['United Kingdom', 'United Kingdom', 'United Kingdom'],
'CustomerID': [17850, 17850, 17850],
'Description': ['WHITE HANGING HEART T-LIGHT HOLDER',
'WHITE METAL LANTERN',
'CREAM CUPID HEARTS COAT HANGER'],
'Invoice': [536365, 536365, 536365],
'NoStockCode': ['85123A', '71053', '84406B'],
'Quantity': [6, 6, 8]})
result = pd.pivot_table(df, index=['Invoice','CustomerID','Country'],
values=['NoStockCode','Description','Quantity'],
aggfunc=lambda x: ', '.join(map(str, x)))
print(result)
yields
产量
Description NoStockCode Quantity
Invoice CustomerID Country
536365 17850 United Kingdom WHITE HANGING HEART T-LIGHT HOLDER, WHITE META... 85123A, 71053, 84406B 6, 6, 8
Note that if Quantity
are int
s, you will need to convert them to str
s before calling ', '.join
. That is why map(str, x)
was used above.
请注意,如果Quantity
是int
s,则需要str
在调用之前将它们转换为s ', '.join
。这就是map(str, x)
上面使用的原因。