pandas DataFrame 单独划分一列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13548721/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:30:18  来源:igfitidea点击:

pandas DataFrame Dividing a column by itself

pythondataframepandas

提问by Brad Fair

I have a pandas dataframe that I filled with this:

我有一个填充了这个的Pandas数据框:

import pandas.io.data as web
test = web.get_data_yahoo('QQQ')

The dataframe looks like this in iPython:

数据框在 iPython 中如下所示:

In [13]:  test
Out[13]:
    <class 'pandas.core.frame.DataFrame'>
    DatetimeIndex: 729 entries, 2010-01-04 00:00:00 to 2012-11-23 00:00:00
    Data columns:
    Open         729  non-null values
    High         729  non-null values
    Low          729  non-null values
    Close        729  non-null values
    Volume       729  non-null values
    Adj Close    729  non-null values
    dtypes: float64(5), int64(1)

When I divide one column by another, I get a float64 result that has a satisfactory number of decimal places. I can even divide one column by another column offset by one, for instance test.Open[1:]/test.Close[:], and get a satisfactory number of decimal places. When I divide a column by itself offset, however, I get just 1:

当我将一列除以另一列时,我得到一个具有令人满意的小数位数的 float64 结果。例如,我什至可以将一列除以偏移一的另一列test.Open[1:]/test.Close[:],并获得令人满意的小数位数。但是,当我将一列除以自己的偏移量时,我只得到 1:

In [83]: test.Open[1:] / test.Close[:]
Out[83]:

    Date
    2010-01-04         NaN
    2010-01-05    0.999354
    2010-01-06    1.005635
    2010-01-07    1.000866
    2010-01-08    0.989689
    2010-01-11    1.005393
...
In [84]: test.Open[1:] / test.Open[:]
Out[84]:
    Date
    2010-01-04   NaN
    2010-01-05     1
    2010-01-06     1
    2010-01-07     1
    2010-01-08     1
    2010-01-11     1

I'm probably missing something simple. What do I need to do in order to get a useful value out of that sort of calculation? Thanks in advance for the assistance.

我可能错过了一些简单的东西。我需要做什么才能从这种计算中获得有用的价值?预先感谢您的帮助。

回答by Chang She

If you're looking to do operations between the column and lagged values, you should be doing something like test.Open / test.Open.shift(). shiftrealigns the data and takes an optional number of periods.

如果您希望在列和滞后值之间进行操作,则应该执行类似test.Open / test.Open.shift(). shift重新对齐数据并采用可选数量的周期。

回答by BrenBarn

You may not be getting what you think you are when you do test.Open[1:]/test.Close. Pandas matches up the rows based on their index, so you're still getting each element of one column divided by its corresponding element in the other column (not the element one row back). Here's an example:

当你这样做时,你可能不会得到你认为的那样test.Open[1:]/test.Close。Pandas 根据它们的索引匹配行,因此您仍然将一列的每个元素除以另一列中的相应元素(而不是前一行的元素)。下面是一个例子:

>>> print d
   A  B   C
0  1  3   7
1 -2  1   6
2  8  6   9
3  1 -5  11
4 -4 -2   0
>>> d.A / d.B
0    0.333333
1   -2.000000
2    1.333333
3   -0.200000
4    2.000000
>>> d.A[1:] / d.B
0         NaN
1   -2.000000
2    1.333333
3   -0.200000
4    2.000000

Notice that the values returned are the same for both operations. The second one just has nanfor the first one, since there was no corresponding value in the first operand.

请注意,两个操作的返回值相同。第二个只有nan第一个,因为第一个操作数中没有对应的值。

If you really want to operate on offset rows, you'll need to dig down to the numpy arrays that underpin the pandas DataFrame, to bypass pandas's index-aligning features. You can get at these innards with the valuesattribute of a column.

如果您真的想对偏移行进行操作,则需要深入研究支持 pandas DataFrame 的 numpy 数组,以绕过 pandas 的索引对齐功能。您可以values使用列的属性获取这些内脏。

>>> d.A.values[1:] / d.B.values[:-1]
array([-0.66666667,  8.        ,  0.16666667,  0.8       ])

Now you really are getting each value divided by the one before it in the other column. Note that here you have to explicitly slice the second operand to leave off the last element, to make them equal in length.

现在,您确实将每个值除以另一列中它之前的值。请注意,在这里您必须显式地对第二个操作数进行切片以去掉最后一个元素,使它们的长度相等。

So you can do the same to divide a column by an offset version of itself:

因此,您可以执行相同的操作,将列除以自身的偏移版本:

>>> d.A.values[1:] / d.A.values[:-1]
45: array([-2.   , -4.   ,  0.125, -4.   ])