Pandas 中的除法:多个列由同一 DataFrame 的另一列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24024928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Division in pandas: multiple columns by another column of the same DataFrame
提问by Bacchus
There are several questions around this topic on SO, but none seem to raise the issue that I am having, I call:
关于这个话题有几个问题,但似乎没有人提出我遇到的问题,我称之为:
df.div(df.col_name, axis = 'index')
on a dataframe which has 7 columns and 3596 rows, the result is invariably:
在具有 7 列和 3596 行的数据帧上,结果总是:
ValueError Traceback (most recent call last)
<ipython-input-55-5797510566fc> in <module>()
[.. several long calls...]
C:\Users\Ataturk\Anaconda\lib\site-packages\pandas\core\ops.pyc in na_op(x, y)
752 result = result.reshape(x.shape)
753
--> 754 result = com._fill_zeros(result, x, y, name, fill_zeros)
755
756 return result
C:\Users\Ataturk\Anaconda\lib\site-packages\pandas\core\common.pyc in _fill_zeros(result, x, y, name, fill)
1252 signs = np.sign(result)
1253 nans = np.isnan(x.ravel())
-> 1254 np.putmask(result, mask & ~nans, fill)
1255
1256 # if we have a fill of inf, then sign it
ValueError: operands could not be broadcast together with shapes (3596,) (25172,)
Division across specific columns works fine:
跨特定列的划分工作正常:
df.one_column / df.col_name
But as soon as I go to multiple columns, same error (with a different number in the last set of parentheses):
但是,一旦我转到多列,就会出现相同的错误(最后一组括号中的数字不同):
df[['one_column_name', 'another_column_name']] / df.col_name
I've tried the various possible syntaxes, .divand /and referencing through [] as well as .name, it's all the same. Dimensions fit, but it seems to append all the columns to be divided to each other, creating the second number, which is of course larger by a factor than the column that it then tries to divide by. What am I doing wrong?
我已经尝试了各种可能的语法,.div并且/通过 [] 和 进行引用.name,都是一样的。尺寸适合,但它似乎附加了所有要彼此划分的列,创建了第二个数字,这当然比它然后试图除以的列大一个因子。我究竟做错了什么?
df.info():
df.info():
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3596 entries, 0 to 3595
Data columns (total 7 columns):
bal_cast 3596 non-null int64
Degt 3596 non-null int64
Meln 3596 non-null int64
Levich 3596 non-null int64
Navu 3596 non-null int64
Mitr 3596 non-null int64
Sob 3596 non-null int64
dtypes: int64(7)
bal_castis the name of the column I am trying to divide by; here is the exact division call, where the relevant dataframe is call result:
bal_cast是我试图除以的列的名称;这是确切的除法调用,其中相关数据帧为 call result:
In [58]: result.div(result.bal_cast, axis='index')
Current conda install:
当前的 conda 安装:
platform : win-64
conda version : 3.5.2
python version : 2.7.6.final.0
Pandas: 0.14.0; Numpy: 1.8.1
Pandas:0.14.0;麻木:1.8.1
EDIT: Following the discussion in the comments, smaller slices of the same table divide through without issue.
编辑:在评论中的讨论之后,同一张表的较小部分没有问题。
回答by Jeff
Workaround is this:
解决方法是这样的:
df.astype('float').div(df['column'].astype('float'),axis='index')
The filling algorithm is choking on this. If you are dividing integers by 0, then you get infs. Their is a bug in that. See here
填充算法对此感到窒息。如果将整数除以 0,则得到infs。他们是一个错误。看这里
Casting to float 'solves' this problem as the a float / 0 is handled by numpy directly. Side note: the reasons pandas handles the division is because numpy int division is truncation and gives you back an integer (which is odd).
强制转换为 float “解决”了这个问题,因为 a float / 0 由 numpy 直接处理。旁注:pandas 处理除法的原因是因为 numpy int 除法被截断并返回一个整数(这是奇数)。
Integers give a weird/odd result in numpy.
整数在 numpy 中给出一个奇怪/奇怪的结果。
In [10]: Series([1])/0
Out[10]:
0 inf
dtype: float64
In [11]: Series([1]).values/0
Out[11]: array([0])
Floats are correct in numpy
浮点数在 numpy 中是正确的
In [12]: Series([1.])/0
Out[12]:
0 inf
dtype: float64
In [14]: Series([1.]).values/0
Out[14]: array([ inf])

