如何在不添加额外索引的情况下使用 Pandas groupby apply()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12410438/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to use Pandas groupby apply() without adding an extra index
提问by user2303
I very often want to create a new DataFrame by combining multiple columns of a grouped DataFrame. The apply() function allows me to do that, but it requires that I create an unneeded index:
我经常想通过组合分组数据帧的多列来创建一个新的数据帧。apply() 函数允许我这样做,但它要求我创建一个不需要的索引:
In [359]: df = pandas.DataFrame({'x': 3 * ['a'] + 2 * ['b'], 'y': np.random.normal(size=5), 'z': np.random.normal(size=5)})
In [360]: df
Out[360]:
x y z
0 a 0.201980 -0.470388
1 a 0.190846 -2.089032
2 a -1.131010 0.227859
3 b -0.263865 -1.906575
4 b -1.335956 -0.722087
In [361]: df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum() / x.z.sum(), 's': (x.y + x.z ** 2).sum() / x.z.sum()}))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/home/emarkley/work/src/partner_analysis2/main.py in <module>()
----> 1 df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum() / x.z.sum(), 's': (x.y + x.z ** 2).sum() / x.z.sum()}))
/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/groupby.py in apply(self, func, *args, **kwargs)
267 applied : type depending on grouped object and function
268 """
--> 269 return self._python_apply_general(func, *args, **kwargs)
270
271 def aggregate(self, func, *args, **kwargs):
/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/groupby.py in _python_apply_general(self, func, *args, **kwargs)
417 group_axes = _get_axes(group)
418
--> 419 res = func(group, *args, **kwargs)
420
421 if not _is_indexed_like(res, group_axes):
/home/emarkley/work/src/partner_analysis2/main.py in <lambda>(x)
----> 1 df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum() / x.z.sum(), 's': (x.y + x.z ** 2).sum() / x.z.sum()}))
/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
371 mgr = self._init_mgr(data, index, columns, dtype=dtype, copy=copy)
372 elif isinstance(data, dict):
--> 373 mgr = self._init_dict(data, index, columns, dtype=dtype)
374 elif isinstance(data, ma.MaskedArray):
375 mask = ma.getmaskarray(data)
/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
454 # figure out the index, if necessary
455 if index is None:
--> 456 index = extract_index(data)
457 else:
458 index = _ensure_index(index)
/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in extract_index(data)
4719
4720 if not indexes and not raw_lengths:
-> 4721 raise ValueError('If use all scalar values, must pass index')
4722
4723 if have_series or have_dicts:
ValueError: If use all scalar values, must pass index
In [362]: df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum() / x.z.sum(), 's': (x.y + x.z ** 2).sum() / x.z.sum()}, index=[0]))
Out[362]:
r s
x
a 0 1.316605 -1.672293
b 0 1.608606 -0.972593
Is there any way to use apply() or some other function to get the same results without the extra index of zeros?
有没有办法使用 apply() 或其他一些函数来获得相同的结果而没有额外的零索引?
回答by Chang She
You're producing an aggregate r and s value per group, so you should be using Serieshere:
您正在为每个组生成聚合 r 和 s 值,因此您应该在Series此处使用:
In [26]: df.groupby('x').apply(lambda x:
Series({'r': (x.y + x.z).sum() / x.z.sum(),
's': (x.y + x.z ** 2).sum() / x.z.sum()}))
Out[26]:
r s
x
a -0.338590 -0.916635
b 66.655533 102.566146

