Python 对熊猫数据框中的两列求和

Question

提问by yoshiserry

when I use this syntax it creates a series rather than adding a column to my new dataframe (sum). Please help.

当我使用此语法时，它会创建一个系列，而不是向我的新数据框（总和）添加一列。请帮忙。

My code:

我的代码：

sum = data['variance'] = data.budget + data.actual

My Data (in dataframe df): (currently has everything except the budget - actual, I want to create a variance column?

我的数据（在数据框 df 中）：（目前除了预算之外的所有内容 - 实际，我想创建一个方差列？

    cluster     date    budget  actual          | budget - actual
0   a   2014-01-01 00:00:00     11000   10000       1000
1   a   2014-02-01 00:00:00     1200    1000
2   a   2014-03-01 00:00:00     200     100
3   b   2014-04-01 00:00:00     200     300
4   b   2014-05-01 00:00:00     400     450
5   c   2014-06-01 00:00:00     700     1000
6   c   2014-07-01 00:00:00     1200    1000
7   c   2014-08-01 00:00:00     200     100
8   c   2014-09-01 00:00:00     200     300

Answer 1

采纳答案by Andy Hayden

I think you've misunderstood some python syntax, the following does two assignments:

我认为你误解了一些 python 语法，下面有两个任务：

In [11]: a = b = 1

In [12]: a
Out[12]: 1

In [13]: b
Out[13]: 1

So in your code it was as if you were doing:

所以在你的代码中，就好像你在做：

sum = df['budget'] + df['actual'] ?# a Series
# and
df['variance'] = df['budget'] + df['actual']  # assigned to a column

The latter creates a new column for df:

后者为 df 创建一个新列：

In [21]: df
Out[21]:
  cluster                 date  budget  actual
0       a  2014-01-01 00:00:00   11000   10000
1       a  2014-02-01 00:00:00    1200    1000
2       a  2014-03-01 00:00:00     200     100
3       b  2014-04-01 00:00:00     200     300
4       b  2014-05-01 00:00:00     400     450
5       c  2014-06-01 00:00:00     700    1000
6       c  2014-07-01 00:00:00    1200    1000
7       c  2014-08-01 00:00:00     200     100
8       c  2014-09-01 00:00:00     200     300

In [22]: df['variance'] = df['budget'] + df['actual']

In [23]: df
Out[23]:
  cluster                 date  budget  actual  variance
0       a  2014-01-01 00:00:00   11000   10000     21000
1       a  2014-02-01 00:00:00    1200    1000      2200
2       a  2014-03-01 00:00:00     200     100       300
3       b  2014-04-01 00:00:00     200     300       500
4       b  2014-05-01 00:00:00     400     450       850
5       c  2014-06-01 00:00:00     700    1000      1700
6       c  2014-07-01 00:00:00    1200    1000      2200
7       c  2014-08-01 00:00:00     200     100       300
8       c  2014-09-01 00:00:00     200     300       500

As an aside, you shouldn't use sumas a variable name as the overrides the built-in sum function.

顺便sum说一句，您不应将其用作变量名，因为它会覆盖内置 sum 函数。

Answer 2

回答by Rishi Bansal

Same think can be done using lambda function. Here I am reading the data from a xlsx file.

同样的想法可以使用 lambda 函数来完成。在这里，我正在从 xlsx 文件中读取数据。

import pandas as pd
df = pd.read_excel("data.xlsx", sheet_name = 4)
print df

Output:

输出：

  cluster Unnamed: 1      date  budget  actual
0       a 2014-01-01  00:00:00   11000   10000
1       a 2014-02-01  00:00:00    1200    1000
2       a 2014-03-01  00:00:00     200     100
3       b 2014-04-01  00:00:00     200     300
4       b 2014-05-01  00:00:00     400     450
5       c 2014-06-01  00:00:00     700    1000
6       c 2014-07-01  00:00:00    1200    1000
7       c 2014-08-01  00:00:00     200     100
8       c 2014-09-01  00:00:00     200     300

Sum two columns into 3rd new one.

将两列相加为第三个新列。

df['variance'] = df.apply(lambda x: x['budget'] + x['actual'], axis=1)
print df

Output:

输出：

  cluster Unnamed: 1      date  budget  actual  variance
0       a 2014-01-01  00:00:00   11000   10000     21000
1       a 2014-02-01  00:00:00    1200    1000      2200
2       a 2014-03-01  00:00:00     200     100       300
3       b 2014-04-01  00:00:00     200     300       500
4       b 2014-05-01  00:00:00     400     450       850
5       c 2014-06-01  00:00:00     700    1000      1700
6       c 2014-07-01  00:00:00    1200    1000      2200
7       c 2014-08-01  00:00:00     200     100       300
8       c 2014-09-01  00:00:00     200     300       500

Answer 3

回答by Archie

You could also use the .add()function:

您还可以使用该.add()功能：

 df.loc[:,'variance'] = df.loc[:,'budget'].add(df.loc[:,'actual'])

Answer 4

回答by R. Cox

If "budget" has any NaN values but you don't want it to sum to NaN then try:

如果“预算”有任何 NaN 值，但您不希望它与 NaN 相加，请尝试：

def fun (b, a):
    if math.isnan(b):
        return a
    else:
        return b + a

f = np.vectorize(fun, otypes=[float])

df['variance'] = f(df['budget'], df_Lp['actual'])

Answer 5

回答by pylist

df['variance'] = df.loc[:,['budget','actual']].sum(axis=1)

Python 对熊猫数据框中的两列求和

提问by yoshiserry

采纳答案by Andy Hayden

回答by Rishi Bansal

回答by Archie

回答by R. Cox

回答by pylist

相关推荐

最近更新

标签

Python 对熊猫数据框中的两列求和

提问by yoshiserry

采纳答案by Andy Hayden

回答by Rishi Bansal

回答by Archie

回答by R. Cox

回答by pylist

相关推荐

Python 中的私有方法

Python中的多个“或”条件

Python Seaborn 地块没有出现

Python 如何在 Pandas 中按数字获取列？

相关推荐

最近更新

标签