pandas 在 groupby 聚合之后指定列顺序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40111546/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:14:25  来源:igfitidea点击:

Specifying column order following groupby aggregation

pythonpandasformat

提问by DJohnson1990

The ordering of my age, height and weight columns is changing with each run of the code. I need to keep the order of my agg columns static because I ultimately refer to this output file according to the column locations. What can I do to make sure age, height and weight are output in the same order every time?

我的年龄、身高和体重列的顺序随着代码的每次运行而变化。我需要保持 agg 列的顺序不变,因为我最终会根据列位置引用此输出文件。我该怎么做才能确保每次都以相同的顺序输出年龄、身高和体重?

d = pd.read_csv(input_file, na_values=[''])
df = pd.DataFrame(d)
df.index_col = ['name', 'address']

df_out = df.groupby(df.index_col).agg({'age':np.mean, 'height':np.sum, 'weight':np.sum})
df_out.to_csv(output_file, sep=',')

回答by jezrael

I think you can use subset:

我认为你可以使用子集:

df_out = df.groupby(df.index_col)
           .agg({'age':np.mean, 'height':np.sum, 'weight':np.sum})[['age','height','weight']]

Also you can use pandasfunctions:

您也可以使用pandas函数:

df_out = df.groupby(df.index_col)
           .agg({'age':'mean', 'height':sum, 'weight':sum})[['age','height','weight']]

Sample:

样本:

df = pd.DataFrame({'name':['q','q','a','a'],
                   'address':['a','a','s','s'],
                   'age':[7,8,9,10],
                   'height':[1,3,5,7],
                   'weight':[5,3,6,8]})

print (df)
  address  age  height name  weight
0       a    7       1    q       5
1       a    8       3    q       3
2       s    9       5    a       6
3       s   10       7    a       8
df.index_col = ['name', 'address']
df_out = df.groupby(df.index_col)
           .agg({'age':'mean', 'height':sum, 'weight':sum})[['age','height','weight']]

print (df_out)
              age  height  weight
name address                     
a    s        9.5      12      14
q    a        7.5       4       8

EDIT by suggestion - add reset_index, here as_index=Falsedoes not work if need index values too:

按建议编辑 - 添加reset_indexas_index=False如果也需要索引值,则此处不起作用:

df_out = df.groupby(df.index_col)
           .agg({'age':'mean', 'height':sum, 'weight':sum})[['age','height','weight']]
           .reset_index()

print (df_out)
  name address  age  height  weight
0    a       s  9.5      12      14
1    q       a  7.5       4       8

回答by goodmami

If you care mostly about the order when written to a file and not while its still in a DataFrame object, you can set the columnsparameter of the to_csv()method:

如果您最关心写入文件时的顺序而不是它仍在 DataFrame 对象中时,您可以设置columnsto_csv()方法的参数:

>>> df = pd.DataFrame(
      {'age': [28,63,28,45],
       'height': [183,156,170,201],
       'weight': [70.2, 62.5, 65.9, 81.0],
       'name': ['Kim', 'Pat', 'Yuu', 'Sacha']},
      columns=['name','age','weight', 'height'])
>>> df
    name  age  weight  height
0    Kim   28    70.2     183
1    Pat   63    62.5     156
2    Yuu   28    65.9     170
3  Sacha   45    81.0     201
>>> df_out = df.groupby(['age'], as_index=False).agg(
      {'weight': sum, 'height': sum})
>>> df_out
   age  height  weight
0   28     353   136.1
1   45     201    81.0
2   63     156    62.5
>>> df_out.to_csv('out.csv', sep=',', columns=['age','height','weight'])

out.csvthen looks like this:

out.csv然后看起来像这样:

,age,height,weight
0,28,353,136.10000000000002
1,45,201,81.0
2,63,156,62.5