pandas DataFrame 单独划分一列

Question

提问by Brad Fair

I have a pandas dataframe that I filled with this:

我有一个填充了这个的Pandas数据框：

import pandas.io.data as web
test = web.get_data_yahoo('QQQ')

The dataframe looks like this in iPython:

数据框在 iPython 中如下所示：

In [13]:  test
Out[13]:
    <class 'pandas.core.frame.DataFrame'>
    DatetimeIndex: 729 entries, 2010-01-04 00:00:00 to 2012-11-23 00:00:00
    Data columns:
    Open         729  non-null values
    High         729  non-null values
    Low          729  non-null values
    Close        729  non-null values
    Volume       729  non-null values
    Adj Close    729  non-null values
    dtypes: float64(5), int64(1)

When I divide one column by another, I get a float64 result that has a satisfactory number of decimal places. I can even divide one column by another column offset by one, for instance test.Open[1:]/test.Close[:], and get a satisfactory number of decimal places. When I divide a column by itself offset, however, I get just 1:

当我将一列除以另一列时，我得到一个具有令人满意的小数位数的 float64 结果。例如，我什至可以将一列除以偏移一的另一列test.Open[1:]/test.Close[:]，并获得令人满意的小数位数。但是，当我将一列除以自己的偏移量时，我只得到 1：

In [83]: test.Open[1:] / test.Close[:]
Out[83]:

    Date
    2010-01-04         NaN
    2010-01-05    0.999354
    2010-01-06    1.005635
    2010-01-07    1.000866
    2010-01-08    0.989689
    2010-01-11    1.005393
...
In [84]: test.Open[1:] / test.Open[:]
Out[84]:
    Date
    2010-01-04   NaN
    2010-01-05     1
    2010-01-06     1
    2010-01-07     1
    2010-01-08     1
    2010-01-11     1

I'm probably missing something simple. What do I need to do in order to get a useful value out of that sort of calculation? Thanks in advance for the assistance.

我可能错过了一些简单的东西。我需要做什么才能从这种计算中获得有用的价值？预先感谢您的帮助。

Answer 1

回答by Chang She

If you're looking to do operations between the column and lagged values, you should be doing something like test.Open / test.Open.shift(). shiftrealigns the data and takes an optional number of periods.

如果您希望在列和滞后值之间进行操作，则应该执行类似test.Open / test.Open.shift(). shift重新对齐数据并采用可选数量的周期。

Answer 2

回答by BrenBarn

You may not be getting what you think you are when you do test.Open[1:]/test.Close. Pandas matches up the rows based on their index, so you're still getting each element of one column divided by its corresponding element in the other column (not the element one row back). Here's an example:

当你这样做时，你可能不会得到你认为的那样test.Open[1:]/test.Close。Pandas 根据它们的索引匹配行，因此您仍然将一列的每个元素除以另一列中的相应元素（而不是前一行的元素）。下面是一个例子：

>>> print d
   A  B   C
0  1  3   7
1 -2  1   6
2  8  6   9
3  1 -5  11
4 -4 -2   0
>>> d.A / d.B
0    0.333333
1   -2.000000
2    1.333333
3   -0.200000
4    2.000000
>>> d.A[1:] / d.B
0         NaN
1   -2.000000
2    1.333333
3   -0.200000
4    2.000000

Notice that the values returned are the same for both operations. The second one just has nanfor the first one, since there was no corresponding value in the first operand.

请注意，两个操作的返回值相同。第二个只有nan第一个，因为第一个操作数中没有对应的值。

If you really want to operate on offset rows, you'll need to dig down to the numpy arrays that underpin the pandas DataFrame, to bypass pandas's index-aligning features. You can get at these innards with the valuesattribute of a column.

如果您真的想对偏移行进行操作，则需要深入研究支持 pandas DataFrame 的 numpy 数组，以绕过 pandas 的索引对齐功能。您可以values使用列的属性获取这些内脏。

>>> d.A.values[1:] / d.B.values[:-1]
array([-0.66666667,  8.        ,  0.16666667,  0.8       ])

Now you really are getting each value divided by the one before it in the other column. Note that here you have to explicitly slice the second operand to leave off the last element, to make them equal in length.

现在，您确实将每个值除以另一列中它之前的值。请注意，在这里您必须显式地对第二个操作数进行切片以去掉最后一个元素，使它们的长度相等。

So you can do the same to divide a column by an offset version of itself:

因此，您可以执行相同的操作，将列除以自身的偏移版本：

>>> d.A.values[1:] / d.A.values[:-1]
45: array([-2.   , -4.   ,  0.125, -4.   ])

pandas DataFrame 单独划分一列

提问by Brad Fair

回答by Chang She

回答by BrenBarn

相关推荐

最近更新

标签

pandas DataFrame 单独划分一列

提问by Brad Fair

回答by Chang She

回答by BrenBarn

相关推荐

如何将 Pandas DatetimeIndex 相应地转换为字符串

Pytables 表转换为 Pandas DataFrame

pandas python pandas删除系列中的重复项

Pandas DataFrame 重新索引列问题

相关推荐

最近更新

标签