使用 Pandas 拆分数据

Question

提问by JD Long

I have some data that I'm taking from 'long' to 'wide'. I have no problem using unstackto make the data wide, but then I end up with what looks like an index which I can't get rid of. Here's a dummy example:

我有一些从“长”到“宽”的数据。我使用unstack宽泛的数据没有问题，但是我最终得到了一个我无法摆脱的索引。这是一个虚拟示例：

## set up some dummy data
import pandas as pd
d = {'state'  : ['a','b','a','b','a','b','a','b'],
     'year' : [1,1,1,1,2,2,2,2],
     'description'  : ['thing1','thing1','thing1','thing2','thing2','thing2','thing1','thing2'],
     'value' : [1., 2., 3., 4.,1., 2., 3., 4.]}
df = pd.DataFrame(d)
## now that we have dummy data do the long to wide conversion

dfGrouped = df.groupby(['state','year', 'description']).value.sum() 

dfUnstacked = dfGrouped.unstack('description')
print dfUnstacked


description  thing1  thing2
state year                 
a     1           4     NaN
      2           3       1
b     1           2       4
      2         NaN       6

So that looks like what I would expect. Now I'd like an unindexed data frame with columns 'state', 'year', 'thing1', 'thing2'. So it seems I should do thus:

所以这看起来像我所期望的。现在我想要一个未索引的数据框，其中包含“state”、“year”、“thing1”、“thing2”列。所以看来我应该这样做：

dfUnstackedNoIndex = dfUnstacked.reset_index()
print dfUnstackedNoIndex

description state  year  thing1  thing2
0               a     1       4     NaN
1               a     2       3       1
2               b     1       2       4
3               b     2     NaN       6

Ok, that's close. But I don't want description carried forward. So let's select out only the columns I want:

好的，差不多了。但我不想继续进行描述。所以让我们只选择我想要的列：

print dfUnstackedNoIndex[['state','year','thing1','thing2']]

description state  year  thing1  thing2
0               a     1       4     NaN
1               a     2       3       1
2               b     1       2       4
3               b     2     NaN       6

So what's up with 'description'? Why does it hang out even though I reset the index and selected only a few columns? Clearly I'm not groking something right.

那么“描述”是怎么回事？为什么即使我重置了索引并只选择了几列，它仍然挂起？显然我不是在摸索正确的东西。

FWIW, my Pandas version is 0.12

FWIW，我的 Pandas 版本是 0.12

Answer 1

采纳答案by unutbu

descriptionis the name of the columns. You can get rid of that like this:

description是列的名称。你可以像这样摆脱它：

In [74]: dfUnstackedNoIndex.columns.name = None

In [75]: dfUnstackedNoIndex
Out[75]: 
  state  year  thing1  thing2
0     a     1       4     NaN
1     a     2       3       1
2     b     1       2       4
3     b     2     NaN       6

The purpose of column names perhaps becomes clearer when you look at what happens when you unstack twice:

当您查看两次取消堆叠时会发生什么时，列名的目的可能会变得更加清晰：

In [107]: dfUnstacked2 = dfUnstacked.unstack('state')
In [108]: dfUnstacked2
Out[108]: 
description  thing1      thing2   
state             a   b       a  b
year                              
1                 4   2     NaN  4
2                 3 NaN       1  6

Now dfUnstacked2.columnsis a MultiIndex. Each levelhas a namewhich corresponds to the name of the index level that has been converted into a column level.

现在dfUnstacked2.columns是一个MultiIndex. 每个level都有一个name对应于已转换为列级别的索引级别的名称。

In [111]: dfUnstacked2.columns
Out[111]: 
MultiIndex(levels=[[u'thing1', u'thing2'], [u'a', u'b']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'description', u'state'])

Column names and index names show up in the same place in the string representation of DataFrames, so it can be hard to know which is which. You can figure it out by inspecting df.index.namesand df.columns.names.

列名和索引名在 DataFrame 的字符串表示中出现在同一位置，因此很难知道哪个是哪个。您可以通过检查df.index.names和来弄清楚df.columns.names。

使用 Pandas 拆分数据

提问by JD Long

采纳答案by unutbu

相关推荐

最近更新

标签

使用 Pandas 拆分数据

提问by JD Long

采纳答案by unutbu

相关推荐

pandas 如何确定 matplotlib 条形图中的条形顺序

在 python pandas 中实现 Apriori 的最佳方法

在多索引 Pandas DataFrame 上选择一列

pandas 熊猫日期时间列到序数

相关推荐

最近更新

标签