python pandas对列的操作

Question

提问by Anthony Martin

Hi I would like to know the best way to do operations on columns in python using pandas.

嗨，我想知道使用 Pandas 对 Python 中的列进行操作的最佳方法。

I have a classical database which I have loaded as a dataframe, and I often have to do operations such as for each row, if value in column labeled 'A' is greater than x then replace this value by column'C' minus column 'D'

我有一个经典数据库，我已将其作为数据框加载，并且我经常需要对每一行进行操作，如果标记为“A”的列中的值大于 x，则将该值替换为“C”列减去“列” D'

for now I do something like

现在我做类似的事情

for i in len(df.index):
    if df.ix[i,'A'] > x :
        df.ix[i,'A'] = df.ix[i,'C'] - df.ix[i, 'D']

I would like to know if there is a simpler way of doing these kind of operations and more importantly the most effective one as I have large databases

我想知道是否有一种更简单的方法来执行此类操作，更重要的是，由于我拥有大型数据库，因此是最有效的方法

I had tried without the for i loop, like in R or Stata, I was advised to use "a.any" or "a.all" but I did non find anything either here or in the pandas docs.

我试过没有 for i 循环，就像在 R 或 Stata 中一样，我被建议使用“a.any”或“a.all”，但我没有在这里或在熊猫文档中找到任何东西。

Thanks by advance.

提前致谢。

Answer 1

回答by Viktor Kerkez

You can just use a boolean mask with either the .locor .ixattributes of the DataFrame.

您可以只使用带有DataFrame的.loc或.ix属性的布尔掩码。

mask = df['A'] > 2
df.ix[mask, 'A'] = df.ix[mask, 'C'] - df.ix[mask, 'D']

If you have a lot of branching things then you can do:

如果你有很多分支的东西，那么你可以这样做：

def func(row):
    if row['A'] > 0:
        return row['B'] + row['C']
    elif row['B'] < 0:
        return row['D'] + row['A']
    else:
        return row['A']

df['A'] = df.apply(func, axis=1)

applyshould generally be much faster than a for loop.

apply通常应该比 for 循环快得多。

Answer 2

回答by Fergal

There's lots of ways of doing this, but here's the pattern I find easiest to read.

有很多方法可以做到这一点，但这里是我觉得最容易阅读的模式。

#Assume df is a Panda's dataframe object
idx = df.loc[:, 'A'] > x
df.loc[idx, 'A'] = df.loc[idx, 'C'] - df.loc[idx, 'D']

Setting the elements less than x is as easy as df.loc[~idx, 'A'] = 0

设置小于 x 的元素就像 df.loc[~idx, 'A'] = 0 一样简单

Answer 3

回答by Amrita Sawant

simplest according to me.

在我看来最简单。

from random import randint, randrange, uniform
import pandas as pd
import numpy as np

df = pd.DataFrame({'a':randrange(0,10),'b':randrange(10,20),'c':np.random.randn(10)})

#If colC > 0,5, then ColC = ColB - Cola 
df['c'][df['c'] > 0.5] = df['b'] - df['a']

Tested, it works.

经测试，有效。

a   b   c
2  11 -0.576309
2  11 -0.578449
2  11 -1.085822
2  11  9.000000
2  11  9.000000
2  11 -1.081405

Answer 4

回答by SAH

Start with..

从...开始..

df = pd.DataFrame({'a':randrange(1,10),'b':randrange(10,20),'c':np.random.randn(10)})
a   b   c
0   7   12  0.475248
1   7   12  -1.090855
2   7   12  -1.227489
3   7   12  0.163929

end with...

以……结束

df.ix[df.A < 1,df.A = df['c'] - df['d']]; df
    a   b   c
0   7   12  5.000000
1   7   12  5.000000
2   7   12  5.000000
3   7   12  5.000000
4   7   12  1.813233

python pandas对列的操作

提问by Anthony Martin

回答by Viktor Kerkez

回答by Fergal

回答by Amrita Sawant

回答by SAH

相关推荐

最近更新

标签

python pandas对列的操作

提问by Anthony Martin

回答by Viktor Kerkez

回答by Fergal

回答by Amrita Sawant

回答by SAH

相关推荐

Python 3 - pickle 可以处理大于 4GB 的字节对象吗？

Python 在来自 DataFrame 的切片副本上设置值

Python 斐波那契数列的高效计算

Python argparse：“无法识别的参数”

相关推荐

最近更新

标签