Pandas 中的除法:多个列由同一 DataFrame 的另一列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24024928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:07:29  来源:igfitidea点击:

Division in pandas: multiple columns by another column of the same DataFrame

python-2.7pandas

提问by Bacchus

There are several questions around this topic on SO, but none seem to raise the issue that I am having, I call:

关于这个话题有几个问题,但似乎没有人提出我遇到的问题,我称之为:

df.div(df.col_name, axis = 'index')

on a dataframe which has 7 columns and 3596 rows, the result is invariably:

在具有 7 列和 3596 行的数据帧上,结果总是:

ValueError                                Traceback (most recent call last)
<ipython-input-55-5797510566fc> in <module>()

[.. several long calls...]

C:\Users\Ataturk\Anaconda\lib\site-packages\pandas\core\ops.pyc in na_op(x, y)
    752             result = result.reshape(x.shape)
    753
--> 754         result = com._fill_zeros(result, x, y, name, fill_zeros)
    755
    756         return result

C:\Users\Ataturk\Anaconda\lib\site-packages\pandas\core\common.pyc in _fill_zeros(result, x, y, name, fill)
   1252                 signs = np.sign(result)
  1253                 nans = np.isnan(x.ravel())
-> 1254                 np.putmask(result, mask & ~nans, fill)
   1255
   1256                 # if we have a fill of inf, then sign it

ValueError: operands could not be broadcast together with shapes (3596,) (25172,)

Division across specific columns works fine:

跨特定列的划分工作正常:

df.one_column / df.col_name

But as soon as I go to multiple columns, same error (with a different number in the last set of parentheses):

但是,一旦我转到多列,就会出现相同的错误(最后一组括号中的数字不同):

df[['one_column_name', 'another_column_name']] / df.col_name

I've tried the various possible syntaxes, .divand /and referencing through [] as well as .name, it's all the same. Dimensions fit, but it seems to append all the columns to be divided to each other, creating the second number, which is of course larger by a factor than the column that it then tries to divide by. What am I doing wrong?

我已经尝试了各种可能的语法,.div并且/通过 [] 和 进行引用.name,都是一样的。尺寸适合,但它似乎附加了所有要彼此划分的列,创建了第二个数字,这当然比它然后试图除以的列大一个因子。我究竟做错了什么?

df.info():

df.info():

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3596 entries, 0 to 3595
Data columns (total 7 columns):
bal_cast    3596 non-null int64
Degt        3596 non-null int64
Meln        3596 non-null int64
Levich      3596 non-null int64
Navu        3596 non-null int64
Mitr        3596 non-null int64
Sob         3596 non-null int64
dtypes: int64(7)

bal_castis the name of the column I am trying to divide by; here is the exact division call, where the relevant dataframe is call result:

bal_cast是我试图除以的列的名称;这是确切的除法调用,其中相关数据帧为 call result

In [58]: result.div(result.bal_cast, axis='index')

Current conda install:

当前的 conda 安装:

         platform : win-64
    conda version : 3.5.2
   python version : 2.7.6.final.0

Pandas: 0.14.0; Numpy: 1.8.1

Pandas:0.14.0;麻木:1.8.1

EDIT: Following the discussion in the comments, smaller slices of the same table divide through without issue.

编辑:在评论中的讨论之后,同一张表的较小部分没有问题。

回答by Jeff

Workaround is this:

解决方法是这样的:

df.astype('float').div(df['column'].astype('float'),axis='index')

The filling algorithm is choking on this. If you are dividing integers by 0, then you get infs. Their is a bug in that. See here

填充算法对此感到窒息。如果将整数除以 0,则得到infs。他们是一个错误。看这里

Casting to float 'solves' this problem as the a float / 0 is handled by numpy directly. Side note: the reasons pandas handles the division is because numpy int division is truncation and gives you back an integer (which is odd).

强制转换为 float “解决”了这个问题,因为 a float / 0 由 numpy 直接处理。旁注:pandas 处理除法的原因是因为 numpy int 除法被截断并返回一个整数(这是奇数)。

Integers give a weird/odd result in numpy.

整数在 numpy 中给出一个奇怪/奇怪的结果。

In [10]: Series([1])/0
Out[10]: 
0    inf
dtype: float64

In [11]: Series([1]).values/0
Out[11]: array([0])

Floats are correct in numpy

浮点数在 numpy 中是正确的

In [12]: Series([1.])/0
Out[12]: 
0    inf
dtype: float64

In [14]: Series([1.]).values/0
Out[14]: array([ inf])