Pandas：将 DataFrameGroupBy 对象转换为所需的格式

Question

提问by Zhubarb

I have a data frame as follows:

我有一个数据框如下：

import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
                   'code' : ['one', 'one', 'two', 'three',
                             'two', 'three', 'one', 'two'],
                   'colour': ['black', 'white','white','white',
                           'black', 'black', 'white', 'white'],
                   'amount' : np.random.randn(8)},  columns= ['id','code','colour','amount'])

I want to be able to group the ids by codeand colourand then sort them with respect to amount.I know how to groupby():

我希望能够到组idS按code和colour，然后排序他们就amount。我知道如何groupby()：

df.groupby(['code','colour']).head(5)
                id   code colour    amount
code  colour                              
one   black  0   1    one  black -0.117307
      white  1   2    one  white  1.653216
             6   7    one  white  0.817205
three black  5   6  three  black  0.567162
      white  3   4  three  white  0.579074
two   black  4   5    two  black -1.683988
      white  2   3    two  white -0.457722
             7   8    two  white -1.277020

However, my desired output is as below, where I have two columns: 1.code/colourcontains the key strings and 2.id:amountcontains id- amounttuples sorted in descending order wrt amount:

但是，我想要的输出如下，我有两列： 1.code/colour包含关键字符串和 2.id:amount包含id-amount按降序排序的元组 wrt amount：

code/colour  id:amount
one/black    {1:-0.117307}
one/white    {2:1.653216, 7:0.817205}
three/black  {6:0.567162}
three/white  {4:0.579074}
two/black    {5:-1.683988}
two/white    {3:-0.457722, 8:-1.277020}

How can I transform the DataFrameGroupByobject displayed above to my desired format? Or, shall I not use groupby()in the first place?

如何DataFrameGroupBy将上面显示的对象转换为我想要的格式？或者，我一开始就不应该使用groupby()吗？

EDIT:Although not in the specified format, the code below kind of gives me the functionality I want:

编辑：虽然不是指定的格式，但下面的代码给了我我想要的功能：

groups = dict(list(df.groupby(['code','colour'])))
groups['one','white']
   id code colour    amount
1   2  one  white  1.331766
6   7  one  white  0.808739

How can I reduce the groups to only include the idand amountcolumn?

如何减少组以仅包含id和amount列？

Answer 1

回答by waitingkuo

First, groupby code and colour and then apply a customized function to format id and amount:

首先，groupby 代码和颜色，然后应用自定义函数来格式化 id 和数量：

df = df.groupby(['code', 'colour']).apply(lambda x:x.set_index('id').to_dict('dict')['amount'])

And then modify the index:

然后修改索引：

df.index = ['/'.join(i) for i in df.index]

It will return a series, you can convert it back to DataFrame by:

它将返回一个系列，您可以通过以下方式将其转换回 DataFrame：

df = df.reset_index()

Finally, add the column names by:

最后，通过以下方式添加列名：

df.columns=['code/colour','id:amount']

Result:

结果：

In [105]: df
Out[105]: 
   code/colour                               id:amount
0    one/black                     {1: 0.392264412544}
1    one/white  {2: 2.13950686015, 7: -0.393002947047}
2  three/black                      {6: -2.0766612539}
3  three/white                     {4: -1.18058561325}
4    two/black                     {5: -1.51959565941}
5    two/white  {8: -1.7659863039, 3: -0.595666853895}

Answer 2

回答by Nipun Batra

Here is an "ugly" way of doing this. First things first- your desired output will not play so well within Pandas since dictis unhashable; so you may lose the real benefit!

这是一种“丑陋”的方法。首先，您想要的输出在 Pandas 中不会表现得很好，因为它dict是不可散列的；所以你可能会失去真正的好处！

od = OrderedDict()
for name, group in df.groupby(['code', 'colour']):
    # Convert the group to a dict
    temp = group[['id', 'amount']].sort(['amount'], ascending=[0]).to_dict()
    # Extract id:amount
    temp2 = {temp['id'][key]: temp['amount'][key] for key in temp['amount'].iterkeys()}
    od["%s/%s" % (name)] = temp2

This is only a start! Not exactly what you are looking for.

这只是一个开始！不完全是你正在寻找的。

Pandas：将 DataFrameGroupBy 对象转换为所需的格式

提问by Zhubarb

回答by waitingkuo

回答by Nipun Batra

相关推荐

最近更新

标签

Pandas：将 DataFrameGroupBy 对象转换为所需的格式

提问by Zhubarb

回答by waitingkuo

回答by Nipun Batra

相关推荐

Pandas 和 sum 和 cum sum 在同一数据框中

在将 Pandas 数据帧列传递给 scikit 学习回归器之前，是否应该以某种方式对其进行转换？

pandas python pandas如何从数据框中删除异常值并替换为先前记录的平均值

将函数应用于 Pandas 中的列集，按列“循环”整个数据框

相关推荐

最近更新

标签