Python 我想将 Pandas DataFrame 中的两列相乘并将结果添加到新列中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14059094/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
I want to multiply two columns in a pandas DataFrame and add the result into a new column
提问by OAK
I'm trying to multiply two existing columns in a pandas Dataframe (orders_df) - Prices (stock close price) and Amount (stock quantities) and add the calculation to a new column called 'Value'. For some reason when I run this code, all the rows under the 'Value' column are positive numbers, while some of the rows should be negative. Under the Action column in the DataFrame there are seven rows with the 'Sell' string and seven with the 'Buy' string.
我正在尝试将 Pandas Dataframe (orders_df) 中的两个现有列相乘 - Prices(股票收盘价)和 Amount(股票数量),并将计算结果添加到名为“Value”的新列中。出于某种原因,当我运行此代码时,“值”列下的所有行都是正数,而某些行应该是负数。在 DataFrame 的 Action 列下,有七行带有“Sell”字符串,七行带有“Buy”字符串。
for i in orders_df.Action:
if i == 'Sell':
orders_df['Value'] = orders_df.Prices*orders_df.Amount
elif i == 'Buy':
orders_df['Value'] = -orders_df.Prices*orders_df.Amount)
Please let me know what i'm doing wrong !
请让我知道我做错了什么!
采纳答案by Aman
If we're willing to sacrifice the succinctness of Hayden's solution, one could also do something like this:
如果我们愿意牺牲 Hayden 解决方案的简洁性,也可以这样做:
In [22]: orders_df['C'] = orders_df.Action.apply(
lambda x: (1 if x == 'Sell' else -1))
In [23]: orders_df # New column C represents the sign of the transaction
Out[23]:
Prices Amount Action C
0 3 57 Sell 1
1 89 42 Sell 1
2 45 70 Buy -1
3 6 43 Sell 1
4 60 47 Sell 1
5 19 16 Buy -1
6 56 89 Sell 1
7 3 28 Buy -1
8 56 69 Sell 1
9 90 49 Buy -1
Now we have eliminated the need for the ifstatement. Using DataFrame.apply(), we also do away with the forloop. As Hayden noted, vectorized operations are always faster.
现在我们已经消除了对if语句的需要。使用DataFrame.apply(),我们还消除了for循环。正如海登所指出的,矢量化操作总是更快。
In [24]: orders_df['Value'] = orders_df.Prices * orders_df.Amount * orders_df.C
In [25]: orders_df # The resulting dataframe
Out[25]:
Prices Amount Action C Value
0 3 57 Sell 1 171
1 89 42 Sell 1 3738
2 45 70 Buy -1 -3150
3 6 43 Sell 1 258
4 60 47 Sell 1 2820
5 19 16 Buy -1 -304
6 56 89 Sell 1 4984
7 3 28 Buy -1 -84
8 56 69 Sell 1 3864
9 90 49 Buy -1 -4410
This solution takes two lines of code instead of one, but is a bit easier to read. I suspect that the computational costs are similar as well.
这个解决方案需要两行代码而不是一行代码,但更容易阅读。我怀疑计算成本也相似。
回答by Andy Hayden
回答by bmu
I think an elegant solution is to use the wheremethod (also see the API docs):
我认为一个优雅的解决方案是使用该where方法(另请参阅API docs):
In [37]: values = df.Prices * df.Amount
In [38]: df['Values'] = values.where(df.Action == 'Sell', other=-values)
In [39]: df
Out[39]:
Prices Amount Action Values
0 3 57 Sell 171
1 89 42 Sell 3738
2 45 70 Buy -3150
3 6 43 Sell 258
4 60 47 Sell 2820
5 19 16 Buy -304
6 56 89 Sell 4984
7 3 28 Buy -84
8 56 69 Sell 3864
9 90 49 Buy -4410
Further more this should be the fastest solution.
此外,这应该是最快的解决方案。
回答by Michael Silverstein
For me, this is the clearest and most intuitive:
对我来说,这是最清晰、最直观的:
values = []
for action in ['Sell','Buy']:
amounts = orders_df['Amounts'][orders_df['Action'==action]].values
if action == 'Sell':
prices = orders_df['Prices'][orders_df['Action'==action]].values
else:
prices = -1*orders_df['Prices'][orders_df['Action'==action]].values
values += list(amounts*prices)
orders_df['Values'] = values
The .valuesmethod returns a numpy arrayallowing you to easily multiply element-wise and then you can cumulatively generate a list by 'adding' to it.
该.values方法返回一个numpy array允许您轻松地按元素相乘,然后您可以通过“添加”来累积生成一个列表。
回答by FLab
回答by Christopher Matthews
Good solution from bmu. I think it's more readable to put the values inside the parentheses vs outside.
来自 bmu 的好解决方案。我认为将值放在括号内而不是放在括号外更具可读性。
df['Values'] = np.where(df.Action == 'Sell',
df.Prices*df.Amount,
-df.Prices*df.Amount)
Using some pandas built in functions.
使用一些内置函数的熊猫。
df['Values'] = np.where(df.Action.eq('Sell'),
df.Prices.mul(df.Amount),
-df.Prices.mul(df.Amount))
回答by Carson
To make things neat, I take Hayden's solution but make a small function out of it.
为了使事情变得整洁,我采用了海登的解决方案,但从中做了一个小功能。
def create_value(row):
if row['Action'] == 'Sell':
return row['Prices'] * row['Amount']
else:
return -row['Prices']*row['Amount']
so that when we want to apply the function to our dataframe, we can do..
这样当我们想将函数应用到我们的数据帧时,我们可以做..
df['Value'] = df.apply(lambda row: create_value(row), axis=1)
...and any modifications only need to occur in the small function itself.
...并且任何修改只需要发生在小函数本身中。
Concise, Readable, and Neat!
简洁、易读、整洁!

