Python 我想将 Pandas DataFrame 中的两列相乘并将结果添加到新列中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14059094/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:20:02  来源:igfitidea点击:

I want to multiply two columns in a pandas DataFrame and add the result into a new column

pythonpython-2.7pandas

提问by OAK

I'm trying to multiply two existing columns in a pandas Dataframe (orders_df) - Prices (stock close price) and Amount (stock quantities) and add the calculation to a new column called 'Value'. For some reason when I run this code, all the rows under the 'Value' column are positive numbers, while some of the rows should be negative. Under the Action column in the DataFrame there are seven rows with the 'Sell' string and seven with the 'Buy' string.

我正在尝试将 Pandas Dataframe (orders_df) 中的两个现有列相乘 - Prices(股票收盘价)和 Amount(股票数量),并将计算结果添加到名为“Value”的新列中。出于某种原因,当我运行此代码时,“值”列下的所有行都是正数,而某些行应该是负数。在 DataFrame 的 Action 列下,有七行带有“Sell”字符串,七行带有“Buy”字符串。

for i in orders_df.Action:
 if i  == 'Sell':
  orders_df['Value'] = orders_df.Prices*orders_df.Amount
 elif i == 'Buy':
  orders_df['Value'] = -orders_df.Prices*orders_df.Amount)

Please let me know what i'm doing wrong !

请让我知道我做错了什么!

采纳答案by Aman

If we're willing to sacrifice the succinctness of Hayden's solution, one could also do something like this:

如果我们愿意牺牲 Hayden 解决方案的简洁性,也可以这样做:

In [22]: orders_df['C'] = orders_df.Action.apply(
               lambda x: (1 if x == 'Sell' else -1))

In [23]: orders_df   # New column C represents the sign of the transaction
Out[23]:
   Prices  Amount Action  C
0       3      57   Sell  1
1      89      42   Sell  1
2      45      70    Buy -1
3       6      43   Sell  1
4      60      47   Sell  1
5      19      16    Buy -1
6      56      89   Sell  1
7       3      28    Buy -1
8      56      69   Sell  1
9      90      49    Buy -1

Now we have eliminated the need for the ifstatement. Using DataFrame.apply(), we also do away with the forloop. As Hayden noted, vectorized operations are always faster.

现在我们已经消除了对if语句的需要。使用DataFrame.apply(),我们还消除了for循环。正如海登所指出的,矢量化操作总是更快。

In [24]: orders_df['Value'] = orders_df.Prices * orders_df.Amount * orders_df.C

In [25]: orders_df   # The resulting dataframe
Out[25]:
   Prices  Amount Action  C  Value
0       3      57   Sell  1    171
1      89      42   Sell  1   3738
2      45      70    Buy -1  -3150
3       6      43   Sell  1    258
4      60      47   Sell  1   2820
5      19      16    Buy -1   -304
6      56      89   Sell  1   4984
7       3      28    Buy -1    -84
8      56      69   Sell  1   3864
9      90      49    Buy -1  -4410

This solution takes two lines of code instead of one, but is a bit easier to read. I suspect that the computational costs are similar as well.

这个解决方案需要两行代码而不是一行代码,但更容易阅读。我怀疑计算成本也相似。

回答by Andy Hayden

You can use the DataFrame applymethod:

您可以使用 DataFrameapply方法:

order_df['Value'] = order_df.apply(lambda row: (row['Prices']*row['Amount']
                                               if row['Action']=='Sell'
                                               else -row['Prices']*row['Amount']),
                                   axis=1)

It is usually faster to use these methods rather than over for loops.

使用这些方法通常比 for 循环更快。

回答by bmu

I think an elegant solution is to use the wheremethod (also see the API docs):

我认为一个优雅的解决方案是使用该where方法(另请参阅API docs):

In [37]: values = df.Prices * df.Amount

In [38]: df['Values'] = values.where(df.Action == 'Sell', other=-values)

In [39]: df
Out[39]: 
   Prices  Amount Action  Values
0       3      57   Sell     171
1      89      42   Sell    3738
2      45      70    Buy   -3150
3       6      43   Sell     258
4      60      47   Sell    2820
5      19      16    Buy    -304
6      56      89   Sell    4984
7       3      28    Buy     -84
8      56      69   Sell    3864
9      90      49    Buy   -4410

Further more this should be the fastest solution.

此外,这应该是最快的解决方案。

回答by Michael Silverstein

For me, this is the clearest and most intuitive:

对我来说,这是最清晰、最直观的:

values = []
for action in ['Sell','Buy']:
    amounts = orders_df['Amounts'][orders_df['Action'==action]].values
    if action == 'Sell':
        prices = orders_df['Prices'][orders_df['Action'==action]].values
    else:
        prices = -1*orders_df['Prices'][orders_df['Action'==action]].values
    values += list(amounts*prices)  
orders_df['Values'] = values

The .valuesmethod returns a numpy arrayallowing you to easily multiply element-wise and then you can cumulatively generate a list by 'adding' to it.

.values方法返回一个numpy array允许您轻松地按元素相乘,然后您可以通过“添加”来累积生成一个列表。

回答by FLab

Since this question came up again, I think a good clean approach is using assign.

由于这个问题再次出现,我认为一个很好的清洁方法是使用assign

The code is quite expressive and self-describing:

该代码非常具有表现力和自我描述性:

df = df.assign(Value = lambda x: x.Prices * x.Amount * x.Action.replace({'Buy' : 1, 'Sell' : -1}))

回答by Christopher Matthews

Good solution from bmu. I think it's more readable to put the values inside the parentheses vs outside.

来自 bmu 的好解决方案。我认为将值放在括号内而不是放在括号外更具可读性。

    df['Values'] = np.where(df.Action == 'Sell', 
                            df.Prices*df.Amount, 
                           -df.Prices*df.Amount)

Using some pandas built in functions.

使用一些内置函数的熊猫。

    df['Values'] = np.where(df.Action.eq('Sell'), 
                            df.Prices.mul(df.Amount), 
                           -df.Prices.mul(df.Amount))

回答by Carson

To make things neat, I take Hayden's solution but make a small function out of it.

为了使事情变得整洁,我采用了海登的解决方案,但从中做了一个小功能。

def create_value(row):
    if row['Action'] == 'Sell':
        return row['Prices'] * row['Amount']
    else:
        return -row['Prices']*row['Amount']

so that when we want to apply the function to our dataframe, we can do..

这样当我们想将函数应用到我们的数据帧时,我们可以做..

df['Value'] = df.apply(lambda row: create_value(row), axis=1)

...and any modifications only need to occur in the small function itself.

...并且任何修改只需要发生在小函数本身中。

Concise, Readable, and Neat!

简洁、易读、整洁!