如果只有一列，为什么 Pandas Transform 会失败

Question

提问by EdChum

After looking at this questionI did some messing about and found this:

看完这个问题后，我做了一些乱七八糟的事情，发现了这个：

import pandas as pd

df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
df['num_totals'] = df.groupby('a').transform('count')

gives ValueError:

ValueError                                Traceback (most recent call last)
<ipython-input-38-157c6339ad93> in <module>()
      3 #df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4], 'b':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
      4 df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
----> 5 df['num_totals'] = df.groupby('a').transform('count')
      6 
      7 #df['num_totals']=df.groupby('a')[['a']].transform('count')

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.pyc in __setitem__(self, key, value)
   2117         else:
   2118             # set column
-> 2119             self._set_item(key, value)
   2120 
   2121     def _setitem_slice(self, key, value):

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.pyc in _set_item(self, key, value)
   2164         """
   2165         value = self._sanitize_column(key, value)
-> 2166         NDFrame._set_item(self, key, value)
   2167 
   2168     def insert(self, loc, column, value, allow_duplicates=False):

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\generic.pyc in _set_item(self, key, value)
    677 
    678     def _set_item(self, key, value):
--> 679         self._data.set(key, value)
    680         self._clear_item_cache()
    681 

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in set(self, item, value)
   1779         except KeyError:
   1780             # insert at end
-> 1781             self.insert(len(self.items), item, value)
   1782 
   1783         self._known_consolidated = False

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in insert(self, loc, item, value, allow_duplicates)
   1793 
   1794             # new block
-> 1795             self._add_new_block(item, value, loc=loc)
   1796 
   1797         except:

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in _add_new_block(self, item, value, loc)
   1909             loc = self.items.get_loc(item)
   1910         new_block = make_block(value, self.items[loc:loc + 1].copy(),
-> 1911                                self.items, fastpath=True)
   1912         self.blocks.append(new_block)
   1913 

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in make_block(values, items, ref_items, klass, fastpath, placement)
    964             klass = ObjectBlock
    965 
--> 966     return klass(values, items, ref_items, ndim=values.ndim, fastpath=fastpath, placement=placement)
    967 
    968 # TODO: flexible with index=None and/or items=None

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in __init__(self, values, items, ref_items, ndim, fastpath, placement)
     42         if len(items) != len(values):
     43             raise ValueError('Wrong number of items passed %d, indices imply %d'
---> 44                              % (len(items), len(values)))
     45 
     46         self.set_ref_locs(placement)

ValueError: Wrong number of items passed 1, indices imply 0

But if I have 2 columns then it works fine:

但如果我有 2 列，那么它工作正常：

df = pd.DataFrame({'a':1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4],'b':1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
df['num_totals'] = df.groupby('a').transform('count')
df



Out[40]:
    a  b  num_totals
0   1  1           4
1   1  1           4
2   1  1           4
3   1  1           4
4   2  2           2
5   2  2           2
6   3  3           3
7   3  3           3
8   3  3           3
9   4  4           7
10  4  4           7
11  4  4           7
12  4  4           7
13  4  4           7
14  4  4           7
15  4  4           7

or if I do this using a single column df:

或者如果我使用单列 df 执行此操作：

df['num_totals']=df.groupby('a')[['a']].transform('count')

There is a similar SO postbut it is unclear to me why a series should fail and a dataframe should work in the immediate above example, and why having 2 or more columns would work.

有一个类似的SO 帖子，但我不清楚为什么一个系列应该失败并且数据框应该在上面的示例中工作，以及为什么有 2 个或更多列会工作。

I am using Python 2.7 64-bit and Pandas 0.12

我使用的是 Python 2.7 64 位和 Pandas 0.12

Answer 1

采纳答案by Jeff

Single Column in the DF

DF 中的单列

As you noted above, this returns a series the same size as the original

如上所述，这将返回一个与原始大小相同的系列

In [32]: df.groupby('a')['a'].transform('count')
Out[32]: 
0     4
1     4
2     4
3     4
4     2
5     2
6     3
7     3
8     3
9     7
10    7
11    7
12    7
13    7
14    7
15    7
Name: a, dtype: int64

However, this is returing an empty frame

然而，这是返回一个空帧

In [33]: df.groupby('a').transform('count')
Out[33]: 
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

you cannot assign a an empty frame as a column to another frame because this is essentially an ambiguous assignment (you can make a case that it should 'work' though)

你不能将一个空框架作为一列分配给另一个框架，因为这本质上是一个不明确的分配（你可以说明它应该“工作”）

Two columns in the starting DF

起始 DF 中的两列

The two column case return a single-column DataFrame

两列情况返回单列DataFrame

In [42]: df2.groupby('a').transform('count')
Out[42]: 
    b
0   4
1   4
2   4
3   4
4   2
5   2
6   3
7   3
8   3
9   7
10  7
11  7
12  7
13  7
14  7
15  7

In [43]: type(df2.groupby('a').transform('count'))
Out[43]: pandas.core.frame.DataFrame

Or a series

In [45]: df2.groupby('a')['a'].transform('count')
Out[45]: 
0     4
1     4
2     4
3     4
4     2
5     2
6     3
7     3
8     3
9     7
10    7
11    7
12    7
13    7
14    7
15    7
Name: a, dtype: int64

In [46]: type(df.groupby('a')['a'].transform('count'))
Out[46]: pandas.core.series.Series

This 'works' because pandas DOES allow assignment of a single column frame to work, as it will take the underlying series.

这“有效”是因为 Pandas 确实允许分配单个列框架工作，因为它将采用基础系列。

So pandas is actually trying to be helpful. That said, I find this an unclear error message for trying to assign an empty frame.

所以大Pandas实际上是在努力提供帮助。也就是说，我发现这是尝试分配空帧的不清楚的错误消息。

如果只有一列，为什么 Pandas Transform 会失败

提问by EdChum

采纳答案by Jeff

Single Column in the DF

DF 中的单列

Two columns in the starting DF

起始 DF 中的两列

相关推荐

最近更新

标签

如果只有一列，为什么 Pandas Transform 会失败

提问by EdChum

采纳答案by Jeff

Single Column in the DF

DF 中的单列

Two columns in the starting DF

起始 DF 中的两列

相关推荐

pandas python熊猫复数

计算 Pandas 时间序列上的每日事件

python pandas：为什么地图更快？

如何修改 Pandas 的 Read_html 用户代理？

相关推荐

最近更新

标签