如果只有一列,为什么 Pandas Transform 会失败
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19267029/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why Pandas Transform fails if you only have a single column
提问by EdChum
After looking at this questionI did some messing about and found this:
看完这个问题后,我做了一些乱七八糟的事情,发现了这个:
import pandas as pd
df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
df['num_totals'] = df.groupby('a').transform('count')
gives ValueError:
ValueError Traceback (most recent call last)
<ipython-input-38-157c6339ad93> in <module>()
3 #df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4], 'b':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
4 df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
----> 5 df['num_totals'] = df.groupby('a').transform('count')
6
7 #df['num_totals']=df.groupby('a')[['a']].transform('count')
C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.pyc in __setitem__(self, key, value)
2117 else:
2118 # set column
-> 2119 self._set_item(key, value)
2120
2121 def _setitem_slice(self, key, value):
C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.pyc in _set_item(self, key, value)
2164 """
2165 value = self._sanitize_column(key, value)
-> 2166 NDFrame._set_item(self, key, value)
2167
2168 def insert(self, loc, column, value, allow_duplicates=False):
C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\generic.pyc in _set_item(self, key, value)
677
678 def _set_item(self, key, value):
--> 679 self._data.set(key, value)
680 self._clear_item_cache()
681
C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in set(self, item, value)
1779 except KeyError:
1780 # insert at end
-> 1781 self.insert(len(self.items), item, value)
1782
1783 self._known_consolidated = False
C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in insert(self, loc, item, value, allow_duplicates)
1793
1794 # new block
-> 1795 self._add_new_block(item, value, loc=loc)
1796
1797 except:
C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in _add_new_block(self, item, value, loc)
1909 loc = self.items.get_loc(item)
1910 new_block = make_block(value, self.items[loc:loc + 1].copy(),
-> 1911 self.items, fastpath=True)
1912 self.blocks.append(new_block)
1913
C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in make_block(values, items, ref_items, klass, fastpath, placement)
964 klass = ObjectBlock
965
--> 966 return klass(values, items, ref_items, ndim=values.ndim, fastpath=fastpath, placement=placement)
967
968 # TODO: flexible with index=None and/or items=None
C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in __init__(self, values, items, ref_items, ndim, fastpath, placement)
42 if len(items) != len(values):
43 raise ValueError('Wrong number of items passed %d, indices imply %d'
---> 44 % (len(items), len(values)))
45
46 self.set_ref_locs(placement)
ValueError: Wrong number of items passed 1, indices imply 0
But if I have 2 columns then it works fine:
但如果我有 2 列,那么它工作正常:
df = pd.DataFrame({'a':1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4],'b':1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
df['num_totals'] = df.groupby('a').transform('count')
df
Out[40]:
a b num_totals
0 1 1 4
1 1 1 4
2 1 1 4
3 1 1 4
4 2 2 2
5 2 2 2
6 3 3 3
7 3 3 3
8 3 3 3
9 4 4 7
10 4 4 7
11 4 4 7
12 4 4 7
13 4 4 7
14 4 4 7
15 4 4 7
or if I do this using a single column df:
或者如果我使用单列 df 执行此操作:
df['num_totals']=df.groupby('a')[['a']].transform('count')
There is a similar SO postbut it is unclear to me why a series should fail and a dataframe should work in the immediate above example, and why having 2 or more columns would work.
有一个类似的SO 帖子,但我不清楚为什么一个系列应该失败并且数据框应该在上面的示例中工作,以及为什么有 2 个或更多列会工作。
I am using Python 2.7 64-bit and Pandas 0.12
我使用的是 Python 2.7 64 位和 Pandas 0.12
采纳答案by Jeff
Single Column in the DF
DF 中的单列
As you noted above, this returns a series the same size as the original
如上所述,这将返回一个与原始大小相同的系列
In [32]: df.groupby('a')['a'].transform('count')
Out[32]:
0 4
1 4
2 4
3 4
4 2
5 2
6 3
7 3
8 3
9 7
10 7
11 7
12 7
13 7
14 7
15 7
Name: a, dtype: int64
However, this is returing an empty frame
然而,这是返回一个空帧
In [33]: df.groupby('a').transform('count')
Out[33]:
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
you cannot assign a an empty frame as a column to another frame because this is essentially an ambiguous assignment (you can make a case that it should 'work' though)
你不能将一个空框架作为一列分配给另一个框架,因为这本质上是一个不明确的分配(你可以说明它应该“工作”)
Two columns in the starting DF
起始 DF 中的两列
The two column case return a single-column DataFrame
两列情况返回单列DataFrame
In [42]: df2.groupby('a').transform('count')
Out[42]:
b
0 4
1 4
2 4
3 4
4 2
5 2
6 3
7 3
8 3
9 7
10 7
11 7
12 7
13 7
14 7
15 7
In [43]: type(df2.groupby('a').transform('count'))
Out[43]: pandas.core.frame.DataFrame
Or a series
In [45]: df2.groupby('a')['a'].transform('count')
Out[45]:
0 4
1 4
2 4
3 4
4 2
5 2
6 3
7 3
8 3
9 7
10 7
11 7
12 7
13 7
14 7
15 7
Name: a, dtype: int64
In [46]: type(df.groupby('a')['a'].transform('count'))
Out[46]: pandas.core.series.Series
This 'works' because pandas DOES allow assignment of a single column frame to work, as it will take the underlying series.
这“有效”是因为 Pandas 确实允许分配单个列框架工作,因为它将采用基础系列。
So pandas is actually trying to be helpful. That said, I find this an unclear error message for trying to assign an empty frame.
所以大Pandas实际上是在努力提供帮助。也就是说,我发现这是尝试分配空帧的不清楚的错误消息。

