堆叠 Pandas DataFrame 时设置列名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32998893/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Set column names when stacking pandas DataFrame
提问by aensm
When stacking a pandas DataFrame, a Seriesis returned. Normally after I stack a DataFrame, I convert it back into a DataFrame. However, the default names coming from the stacked data make renaming the columns a bit hacky. What I'm looking for is an easier/built-in way to give columns sensible names after stacking.
堆叠 pandas 时DataFrame,Series返回 a。通常在我堆叠 a 后DataFrame,我将其转换回 a DataFrame。然而,来自堆叠数据的默认名称使得重命名列有点麻烦。我正在寻找的是一种更简单/内置的方法,可以在堆叠后为列指定合理的名称。
E.g., for the following DataFrame:
例如,对于以下内容DataFrame:
In [64]: df = pd.DataFrame({'id':[1,2,3],
...: 'date':['2015-09-31']*3,
...: 'value':[100, 95, 42],
...: 'value2':[200, 57, 27]}).set_index(['id','date'])
In [65]: df
Out[65]:
value value2
id date
1 2015-09-31 100 200
2 2015-09-31 95 57
3 2015-09-31 42 27
I stack and convert it back to a DataFramelike so:
我堆叠并将其转换回DataFrame像这样:
In [68]: df.stack().reset_index()
Out[68]:
id date level_2 0
0 1 2015-09-31 value 100
1 1 2015-09-31 value2 200
2 2 2015-09-31 value 95
3 2 2015-09-31 value2 57
4 3 2015-09-31 value 42
5 3 2015-09-31 value2 27
So in order to name these columns appropriately I would need to do something like this:
因此,为了适当地命名这些列,我需要执行以下操作:
In [72]: stacked = df.stack()
In [73]: stacked
Out[73]:
id date
1 2015-09-31 value 100
value2 200
2 2015-09-31 value 95
value2 57
3 2015-09-31 value 42
value2 27
dtype: int64
In [74]: stacked.index.set_names('var_name', level=len(stacked.index.names)-1, inplace=True)
In [88]: stacked.reset_index().rename(columns={0:'value'})
Out[88]:
id date var_name value
0 1 2015-09-31 value 100
1 1 2015-09-31 value2 200
2 2 2015-09-31 value 95
3 2 2015-09-31 value2 57
4 3 2015-09-31 value 42
5 3 2015-09-31 value2 27
Ideally, the solution would look something like this:
理想情况下,解决方案如下所示:
df.stack(new_index_name='var_name', new_col_name='value')
But looking at the docsit doesn't look like stacktakes any such arguments. Is there an easier/built-in way in pandas to deal with this workflow?
但是看看文档,它看起来并不stack需要任何这样的论点。在 Pandas 中是否有更简单/内置的方法来处理这个工作流程?
采纳答案by unutbu
pd.meltis often useful for converting DataFrames from "wide" to "long" format. You could use pd.melthere if you convert the idand dateindex levels to columns first:
pd.melt通常用于将数据帧从“宽”格式转换为“长”格式。pd.melt如果您先将id和date索引级别转换为列,则可以在此处使用:
In [56]: pd.melt(df.reset_index(), id_vars=['id', 'date'], value_vars=['value', 'value2'], var_name='var_name', value_name='value')
Out[56]:
id date var_name value
0 1 2015-09-31 value 100
1 2 2015-09-31 value 95
2 3 2015-09-31 value 42
3 1 2015-09-31 value2 200
4 2 2015-09-31 value2 57
5 3 2015-09-31 value2 27
回答by chrisb
So here's one way that you may find a bit cleaner, using the fact that columnsand Seriescan also carry names.
所以这里有一种你可能会觉得更简洁的方法,使用columns和Series也可以带有名称。
In [45]: df
Out[45]:
value value2
id date
1 2015-09-31 100 200
2 2015-09-31 95 57
3 2015-09-31 42 27
In [46]: df.columns.name = 'var_name'
In [47]: s = df.stack()
In [48]: s.name = 'value'
In [49]: s.reset_index()
Out[49]:
id date var_name value
0 1 2015-09-31 value 100
1 1 2015-09-31 value2 200
2 2 2015-09-31 value 95
3 2 2015-09-31 value2 57
4 3 2015-09-31 value 42
5 3 2015-09-31 value2 27
回答by krassowski
A pipe-ing friendly alternative to chrisb's answer:
chrisb 答案的管道友好替代方案:
df.stack().rename_axis(['id', 'date', 'var_name']).rename('value').reset_index()
And if explicit is better than implicit:
如果显式优于隐式:
(
df
.stack()
.rename_axis(index={'id': 'id', 'date': 'date', None: 'var_name'})
.rename('value')
.reset_index()
)
When using the dict mapper, you can skip the names which should stay the same:
使用 dict 映射器时,您可以跳过应该保持不变的名称:
df.stack().rename_axis(index={None: 'var_name'}).rename('value').reset_index()

