pandas 如何根据pandas中其他列的值计算新列-python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18472634/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:06:57  来源:igfitidea点击:

how to compute a new column based on the values of other columns in pandas - python

pythonpandasdataframe

提问by HappyPy

Let's say my data frame contains these data:

假设我的数据框包含这些数据:

>>> df = pd.DataFrame({'a':['l1','l2','l1','l2','l1','l2'],
                       'b':['1','2','2','1','2','2']})
>>> df
    a       b
0  l1       1
1  l2       2
2  l1       2
3  l2       1
4  l1       2
5  l2       2

l1should correspond to 1whereas l2should correspond to 2. I'd like to create a new column 'c' such that, for each row, c = 1if a = l1and b = 1(or a = l2and b = 2). If a = l1and b = 2(or a = l2and b = 1) then c = 0.

l1应该对应1l2应该对应2。我想创建一个新列 ' c',对于每一行,c = 1如果a = l1b = 1(或a = l2b = 2)。如果a = l1b = 2(或a = l2b = 1)那么c = 0

The resulting data frame should look like this:

生成的数据框应如下所示:

  a         b   c
0  l1       1   1
1  l2       2   1
2  l1       2   0
3  l2       1   0
4  l1       2   0
5  l2       2   1

My data frame is very large so I'm really looking for the most efficient way to do this using pandas.

我的数据框非常大,所以我真的在寻找使用 Pandas 执行此操作的最有效方法。

回答by chlunde

df = pd.DataFrame({'a': numpy.random.choice(['l1', 'l2'], 1000000),
                   'b': numpy.random.choice(['1', '2'], 1000000)})

A fast solution assuming only two distinct values:

假设只有两个不同值的快速解决方案:

%timeit df['c'] = ((df.a == 'l1') == (df.b == '1')).astype(int)

10 loops, best of 3: 178 ms per loop

10 个循环,最好的 3 个:每个循环 178 毫秒

@Viktor Kerkes:

@维克多·凯克斯:

%timeit df['c'] = (df.a.str[-1] == df.b).astype(int)

1 loops, best of 3: 412 ms per loop

1 个循环,最好的 3 个:每个循环 412 毫秒

@user1470788:

@用户1470788:

%timeit df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)

1 loops, best of 3: 363 ms per loop

1 个循环,最好的 3 个:每个循环 363 毫秒

@herrfz

@herrfz

%timeit df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)

1 loops, best of 3: 387 ms per loop

1 个循环,最好的 3 个:每个循环 387 毫秒

回答by Viktor Kerkez

You can also use the string methods.

您还可以使用字符串方法。

df['c'] = (df.a.str[-1] == df.b).astype(int)

回答by herrfz

df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)

df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)

回答by user1470788

You can just use logical operators. I'm not sure why you're using strings of 1 and 2 rather than ints, but here's a solution. The astype at the end converts it from boolean to 0's and 1's.

您可以只使用逻辑运算符。我不确定您为什么使用 1 和 2 的字符串而不是整数,但这是一个解决方案。最后的 astype 将它从布尔值转换为 0 和 1。

df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)

df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)