从 Pandas DataFrame 中的所有行中减去第一行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24370711/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Subtract first row from all rows in Pandas DataFrame
提问by pbreach
I have a pandas dataframe:
我有一个Pandas数据框:
a = pd.DataFrame(rand(5,6)*10, index=pd.DatetimeIndex(start='2005', periods=5, freq='A'))
a.columns = pd.MultiIndex.from_product([('A','B'),('a','b','c')])
I want to subtract the row a['2005']from a. To do that I've tried this:
我想a['2005']从a. 为此,我尝试了以下方法:
In [22]:
a - a.ix['2005']
Out[22]:
A B
a b c a b c
2005-12-31 0 0 0 0 0 0
2006-12-31 NaN NaN NaN NaN NaN NaN
2007-12-31 NaN NaN NaN NaN NaN NaN
2008-12-31 NaN NaN NaN NaN NaN NaN
2009-12-31 NaN NaN NaN NaN NaN NaN
Which obviously doesn't work because pandas is lining up the index while doing the operation. This works:
这显然不起作用,因为Pandas在执行操作时正在排列索引。这有效:
In [24]:
pd.DataFrame(a.values - a['2005'].values, index=a.index, columns=a.columns)
Out[24]:
A B
a b c a b c
2005-12-31 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2006-12-31 -3.326761 -7.164628 8.188518 -0.863177 0.519587 -3.281982
2007-12-31 3.529531 -4.719756 8.444488 1.355366 7.468361 -4.023797
2008-12-31 3.139185 -8.420257 1.465101 -2.942519 1.219060 -5.146019
2009-12-31 -3.459710 0.519435 -1.049617 -2.779370 4.792227 -1.922461
But I don't want to have to form a new DataFrame every time I have to do this kind of operation. I've tried the apply() method like this: a.apply(lambda x: x-a['2005'].values)but I get ValueError: cannot copy sequence with size 6 to array axis with dimension 5
So I'm not really sure how to proceed. Is there a simple way to do this that I am not seeing? I think there should be an easy way to do this in place so you don't have to construct a new dataframe each time. I also tried the sub()method but the subtraction is only applied to the first row whereas I want to subtract the first row from each row in the dataframe.
但是我不想每次必须做这种操作时都必须形成一个新的DataFrame。我已经尝试过这样的 apply() 方法:a.apply(lambda x: x-a['2005'].values)但是我得到了ValueError: cannot copy sequence with size 6 to array axis with dimension 5
所以我不确定如何继续。有没有一种我没有看到的简单方法来做到这一点?我认为应该有一种简单的方法来做到这一点,这样你就不必每次都构建一个新的数据框。我也尝试过该sub()方法,但减法仅适用于第一行,而我想从数据框中的每一行中减去第一行。
回答by unutbu
Pandas is great for aligning by index. So when you want Pandas to ignore the index, you need to drop the index. You can do that by converting the DataFrame a.loc['2005']to a 1-dimensional NumPy array:
Pandas 非常适合按索引对齐。所以当你想让 Pandas 忽略索引时,你需要删除索引。您可以通过将 DataFrame 转换a.loc['2005']为一维 NumPy 数组来实现:
In [56]: a - a.loc['2005'].values.squeeze()
Out[56]:
A B
a b c a b c
2005-12-31 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2006-12-31 0.325968 1.314776 -0.789328 -0.344669 -2.518857 7.361711
2007-12-31 0.084203 2.234445 -2.838454 -6.176795 -3.645513 8.955443
2008-12-31 3.798700 0.299529 1.303325 -2.770126 -1.284188 3.093806
2009-12-31 1.520930 2.660040 0.846996 -9.437851 -2.886603 6.705391
The squeezemethodconverts the NumPy array, a.loc['2005'], of shape to (1, 6)to an array of shape (6,). This allows the array to be broadcasted (during the subtraction) as desired.
该squeeze方法将 NumPy 数组 ,a.loc['2005']的 shape转换为 shape(1, 6)的数组(6,)。这允许根据需要广播数组(在减法期间)。
回答by Gourneau
Here is a more verbose simple break down of how to do this.
这是如何执行此操作的更详细的简单分解。
First make a simple DataFrame to make it easier to understand.
首先制作一个简单的DataFrame,让它更容易理解。
import numpy as np
import pandas as pd
#make a simple DataFrame
df = pd.DataFrame(np.fromfunction(lambda i, j: i+1 , (3, 3), dtype=int))
Which will look like this
看起来像这样
# 1 1 1
# 2 2 2
# 3 3 3
Now get the values from the first row
现在从第一行获取值
first_row = df.iloc[[0]].values[0]
Now use apply() to subtract the first row from the rest of the rows.
现在使用 apply() 从其余行中减去第一行。
df.apply(lambda row: row - first_row, axis=1)
The result will look like this. See that 1 was subtracted from each row
结果将如下所示。看到每行减1
# 0 0 0
# 1 1 1
# 2 2 2
回答by Satyajeet Patil
For timestamp values to calculate how much time passed with respect to the start time use:
对于时间戳值来计算相对于开始时间使用了多少时间:
df['Time_column'].apply(lambda x: x-df.iloc[[0],[1]])
Where df.iloc[[0],[1]]= the start time
其中df.iloc[[0],[1]]= 开始时间

