堆叠 Pandas DataFrame 时设置列名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32998893/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:59:42  来源:igfitidea点击:

Set column names when stacking pandas DataFrame

pythonpandas

提问by aensm

When stacking a pandas DataFrame, a Seriesis returned. Normally after I stack a DataFrame, I convert it back into a DataFrame. However, the default names coming from the stacked data make renaming the columns a bit hacky. What I'm looking for is an easier/built-in way to give columns sensible names after stacking.

堆叠 pandas 时DataFrameSeries返回 a。通常在我堆叠 a 后DataFrame,我将其转换回 a DataFrame。然而,来自堆叠数据的默认名称使得重命名列有点麻烦。我正在寻找的是一种更简单/内置的方法,可以在堆叠后为列指定合理的名称。

E.g., for the following DataFrame:

例如,对于以下内容DataFrame

In [64]: df = pd.DataFrame({'id':[1,2,3], 
    ...:                    'date':['2015-09-31']*3, 
    ...:                    'value':[100, 95, 42], 
    ...:                    'value2':[200, 57, 27]}).set_index(['id','date'])

In [65]: df
Out[65]: 
               value  value2
id date                     
1  2015-09-31    100     200
2  2015-09-31     95      57
3  2015-09-31     42      27

I stack and convert it back to a DataFramelike so:

我堆叠并将其转换回DataFrame像这样:

In [68]: df.stack().reset_index()
Out[68]: 
   id        date level_2    0
0   1  2015-09-31   value  100
1   1  2015-09-31  value2  200
2   2  2015-09-31   value   95
3   2  2015-09-31  value2   57
4   3  2015-09-31   value   42
5   3  2015-09-31  value2   27

So in order to name these columns appropriately I would need to do something like this:

因此,为了适当地命名这些列,我需要执行以下操作:

In [72]: stacked = df.stack()

In [73]: stacked
Out[73]: 
id  date              
1   2015-09-31  value     100
                value2    200
2   2015-09-31  value      95
                value2     57
3   2015-09-31  value      42
                value2     27
dtype: int64

In [74]: stacked.index.set_names('var_name', level=len(stacked.index.names)-1, inplace=True)

In [88]: stacked.reset_index().rename(columns={0:'value'})
Out[88]: 
   id        date var_name  value
0   1  2015-09-31    value    100
1   1  2015-09-31   value2    200
2   2  2015-09-31    value     95
3   2  2015-09-31   value2     57
4   3  2015-09-31    value     42
5   3  2015-09-31   value2     27

Ideally, the solution would look something like this:

理想情况下,解决方案如下所示:

df.stack(new_index_name='var_name', new_col_name='value')

But looking at the docsit doesn't look like stacktakes any such arguments. Is there an easier/built-in way in pandas to deal with this workflow?

但是看看文档,它看起来并不stack需要任何这样的论点。在 Pandas 中是否有更简单/内置的方法来处理这个工作流程?

采纳答案by unutbu

pd.meltis often useful for converting DataFrames from "wide" to "long" format. You could use pd.melthere if you convert the idand dateindex levels to columns first:

pd.melt通常用于将数据帧从“宽”格式转换为“长”格式。pd.melt如果您先将iddate索引级别转换为列,则可以在此处使用:

In [56]: pd.melt(df.reset_index(), id_vars=['id', 'date'], value_vars=['value', 'value2'], var_name='var_name', value_name='value')
Out[56]: 
   id        date var_name  value
0   1  2015-09-31    value    100
1   2  2015-09-31    value     95
2   3  2015-09-31    value     42
3   1  2015-09-31   value2    200
4   2  2015-09-31   value2     57
5   3  2015-09-31   value2     27

回答by chrisb

So here's one way that you may find a bit cleaner, using the fact that columnsand Seriescan also carry names.

所以这里有一种你可能会觉得更简洁的方法,使用columnsSeries也可以带有名称。

In [45]: df
Out[45]: 
               value  value2
id date                     
1  2015-09-31    100     200
2  2015-09-31     95      57
3  2015-09-31     42      27

In [46]: df.columns.name = 'var_name'

In [47]: s = df.stack()

In [48]: s.name = 'value'

In [49]: s.reset_index()
Out[49]: 
   id        date var_name  value
0   1  2015-09-31    value    100
1   1  2015-09-31   value2    200
2   2  2015-09-31    value     95
3   2  2015-09-31   value2     57
4   3  2015-09-31    value     42
5   3  2015-09-31   value2     27

回答by krassowski

A pipe-ing friendly alternative to chrisb's answer:

chrisb 答案的管道友好替代方案:

df.stack().rename_axis(['id', 'date', 'var_name']).rename('value').reset_index()

And if explicit is better than implicit:

如果显式优于隐式:

(
    df
    .stack()
    .rename_axis(index={'id': 'id', 'date': 'date', None: 'var_name'})
    .rename('value')
    .reset_index()
)

When using the dict mapper, you can skip the names which should stay the same:

使用 dict 映射器时,您可以跳过应该保持不变的名称:

df.stack().rename_axis(index={None: 'var_name'}).rename('value').reset_index()