从 Pandas DataFrame 中的所有行中减去第一行

Question

提问by pbreach

I have a pandas dataframe:

我有一个Pandas数据框：

a = pd.DataFrame(rand(5,6)*10, index=pd.DatetimeIndex(start='2005', periods=5, freq='A'))
a.columns = pd.MultiIndex.from_product([('A','B'),('a','b','c')])

I want to subtract the row a['2005']from a. To do that I've tried this:

我想a['2005']从a. 为此，我尝试了以下方法：

In [22]:

a - a.ix['2005']

Out[22]:
    A   B
    a   b   c   a   b   c
2005-12-31  0   0   0   0   0   0
2006-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2007-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2008-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2009-12-31  NaN     NaN     NaN     NaN     NaN     NaN

Which obviously doesn't work because pandas is lining up the index while doing the operation. This works:

这显然不起作用，因为Pandas在执行操作时正在排列索引。这有效：

In [24]:

pd.DataFrame(a.values - a['2005'].values, index=a.index, columns=a.columns)

Out[24]:
    A   B
    a   b   c   a   b   c
2005-12-31  0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
2006-12-31  -3.326761   -7.164628   8.188518    -0.863177   0.519587    -3.281982
2007-12-31  3.529531    -4.719756   8.444488    1.355366    7.468361    -4.023797
2008-12-31  3.139185    -8.420257   1.465101    -2.942519   1.219060    -5.146019
2009-12-31  -3.459710   0.519435    -1.049617   -2.779370   4.792227    -1.922461

But I don't want to have to form a new DataFrame every time I have to do this kind of operation. I've tried the apply() method like this: a.apply(lambda x: x-a['2005'].values)but I get ValueError: cannot copy sequence with size 6 to array axis with dimension 5So I'm not really sure how to proceed. Is there a simple way to do this that I am not seeing? I think there should be an easy way to do this in place so you don't have to construct a new dataframe each time. I also tried the sub()method but the subtraction is only applied to the first row whereas I want to subtract the first row from each row in the dataframe.

但是我不想每次必须做这种操作时都必须形成一个新的DataFrame。我已经尝试过这样的 apply() 方法：a.apply(lambda x: x-a['2005'].values)但是我得到了ValueError: cannot copy sequence with size 6 to array axis with dimension 5所以我不确定如何继续。有没有一种我没有看到的简单方法来做到这一点？我认为应该有一种简单的方法来做到这一点，这样你就不必每次都构建一个新的数据框。我也尝试过该sub()方法，但减法仅适用于第一行，而我想从数据框中的每一行中减去第一行。

Answer 1

回答by unutbu

Pandas is great for aligning by index. So when you want Pandas to ignore the index, you need to drop the index. You can do that by converting the DataFrame a.loc['2005']to a 1-dimensional NumPy array:

Pandas 非常适合按索引对齐。所以当你想让 Pandas 忽略索引时，你需要删除索引。您可以通过将 DataFrame 转换a.loc['2005']为一维 NumPy 数组来实现：

In [56]: a - a.loc['2005'].values.squeeze()
Out[56]: 
                   A                             B                    
                   a         b         c         a         b         c
2005-12-31  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
2006-12-31  0.325968  1.314776 -0.789328 -0.344669 -2.518857  7.361711
2007-12-31  0.084203  2.234445 -2.838454 -6.176795 -3.645513  8.955443
2008-12-31  3.798700  0.299529  1.303325 -2.770126 -1.284188  3.093806
2009-12-31  1.520930  2.660040  0.846996 -9.437851 -2.886603  6.705391

The squeezemethodconverts the NumPy array, a.loc['2005'], of shape to (1, 6)to an array of shape (6,). This allows the array to be broadcasted (during the subtraction) as desired.

该squeeze方法将 NumPy 数组 ,a.loc['2005']的 shape转换为 shape(1, 6)的数组(6,)。这允许根据需要广播数组（在减法期间）。

Answer 2

回答by Gourneau

Here is a more verbose simple break down of how to do this.

这是如何执行此操作的更详细的简单分解。

First make a simple DataFrame to make it easier to understand.

首先制作一个简单的DataFrame，让它更容易理解。

import numpy as np
import pandas as pd
#make a simple DataFrame
df = pd.DataFrame(np.fromfunction(lambda i, j: i+1 , (3, 3), dtype=int))

Which will look like this

看起来像这样

# 1 1 1
# 2 2 2
# 3 3 3

Now get the values from the first row

现在从第一行获取值

first_row = df.iloc[[0]].values[0]

Now use apply() to subtract the first row from the rest of the rows.

现在使用 apply() 从其余行中减去第一行。

df.apply(lambda row: row - first_row, axis=1)

The result will look like this. See that 1 was subtracted from each row

结果将如下所示。看到每行减1

#  0 0 0
#  1 1 1
#  2 2 2

Answer 3

回答by Satyajeet Patil

For timestamp values to calculate how much time passed with respect to the start time use:

对于时间戳值来计算相对于开始时间使用了多少时间：

df['Time_column'].apply(lambda x: x-df.iloc[[0],[1]])

Where df.iloc[[0],[1]]= the start time

其中df.iloc[[0],[1]]= 开始时间

从 Pandas DataFrame 中的所有行中减去第一行

提问by pbreach

回答by unutbu

回答by Gourneau

回答by Satyajeet Patil

相关推荐

最近更新

标签

从 Pandas DataFrame 中的所有行中减去第一行

提问by pbreach

回答by unutbu

回答by Gourneau

回答by Satyajeet Patil

相关推荐

在 Pandas 中加载通用的 Google 电子表格

pandas to_sql pandas方法改变sqlite表的scheme

使用 XlsxWriter 在 Pandas 中导出到“xlsx”时应用样式

Python 中的 Fama Macbeth 回归（Pandas 或 Statsmodels）

相关推荐

最近更新

标签