python pandas对列的操作
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18181973/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas operations on columns
提问by Anthony Martin
Hi I would like to know the best way to do operations on columns in python using pandas.
嗨,我想知道使用 Pandas 对 Python 中的列进行操作的最佳方法。
I have a classical database which I have loaded as a dataframe, and I often have to do operations such as for each row, if value in column labeled 'A' is greater than x then replace this value by column'C' minus column 'D'
我有一个经典数据库,我已将其作为数据框加载,并且我经常需要对每一行进行操作,如果标记为“A”的列中的值大于 x,则将该值替换为“C”列减去“列” D'
for now I do something like
现在我做类似的事情
for i in len(df.index):
if df.ix[i,'A'] > x :
df.ix[i,'A'] = df.ix[i,'C'] - df.ix[i, 'D']
I would like to know if there is a simpler way of doing these kind of operations and more importantly the most effective one as I have large databases
我想知道是否有一种更简单的方法来执行此类操作,更重要的是,由于我拥有大型数据库,因此是最有效的方法
I had tried without the for i loop, like in R or Stata, I was advised to use "a.any" or "a.all" but I did non find anything either here or in the pandas docs.
我试过没有 for i 循环,就像在 R 或 Stata 中一样,我被建议使用“a.any”或“a.all”,但我没有在这里或在熊猫文档中找到任何东西。
Thanks by advance.
提前致谢。
回答by Viktor Kerkez
You can just use a boolean mask with either the .loc
or .ix
attributes of the DataFrame.
您可以只使用带有DataFrame的.loc
或.ix
属性的布尔掩码。
mask = df['A'] > 2
df.ix[mask, 'A'] = df.ix[mask, 'C'] - df.ix[mask, 'D']
If you have a lot of branching things then you can do:
如果你有很多分支的东西,那么你可以这样做:
def func(row):
if row['A'] > 0:
return row['B'] + row['C']
elif row['B'] < 0:
return row['D'] + row['A']
else:
return row['A']
df['A'] = df.apply(func, axis=1)
apply
should generally be much faster than a for loop.
apply
通常应该比 for 循环快得多。
回答by Fergal
There's lots of ways of doing this, but here's the pattern I find easiest to read.
有很多方法可以做到这一点,但这里是我觉得最容易阅读的模式。
#Assume df is a Panda's dataframe object
idx = df.loc[:, 'A'] > x
df.loc[idx, 'A'] = df.loc[idx, 'C'] - df.loc[idx, 'D']
Setting the elements less than x is as easy as df.loc[~idx, 'A'] = 0
设置小于 x 的元素就像 df.loc[~idx, 'A'] = 0 一样简单
回答by Amrita Sawant
simplest according to me.
在我看来最简单。
from random import randint, randrange, uniform
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':randrange(0,10),'b':randrange(10,20),'c':np.random.randn(10)})
#If colC > 0,5, then ColC = ColB - Cola
df['c'][df['c'] > 0.5] = df['b'] - df['a']
Tested, it works.
经测试,有效。
a b c
2 11 -0.576309
2 11 -0.578449
2 11 -1.085822
2 11 9.000000
2 11 9.000000
2 11 -1.081405
回答by SAH
Start with..
从...开始..
df = pd.DataFrame({'a':randrange(1,10),'b':randrange(10,20),'c':np.random.randn(10)})
a b c
0 7 12 0.475248
1 7 12 -1.090855
2 7 12 -1.227489
3 7 12 0.163929
end with...
以……结束
df.ix[df.A < 1,df.A = df['c'] - df['d']]; df
a b c
0 7 12 5.000000
1 7 12 5.000000
2 7 12 5.000000
3 7 12 5.000000
4 7 12 1.813233