Python 使用熊猫比较两列

Question

提问by Merlin

Using this as a starting point:

以此为起点：

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

Out[8]: 
  one  two three
0   10  1.2   4.2
1   15  70   0.03
2    8   5     0

I want to use something like an ifstatement within pandas.

我想if在熊猫中使用类似语句的东西。

if df['one'] >= df['two'] and df['one'] <= df['three']:
    df['que'] = df['one']

Basically, check each row via the ifstatement, create new column.

基本上，通过if语句检查每一行，创建新列。

The docs say to use .allbut there is no example...

文档说要使用，.all但没有例子......

Answer 1

采纳答案by unutbu

You could use np.where. If condis a boolean array, and Aand Bare arrays, then

你可以使用np.where。如果cond是布尔数组，并且A和B是数组，则

C = np.where(cond, A, B)

defines C to be equal to Awhere condis True, and Bwhere condis False.

定义 C 等于Awherecond为 True，Bwherecond为 False。

import numpy as np
import pandas as pd

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
                     , df['one'], np.nan)

yields

产量

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03  NaN
2   8    5     0  NaN

If you have more than one condition, then you could use np.selectinstead. For example, if you wish df['que']to equal df['two']when df['one'] < df['two'], then

如果您有多个条件，那么您可以使用np.select代替。例如，如果您希望df['que']等于df['two']when df['one'] < df['two']，则

conditions = [
    (df['one'] >= df['two']) & (df['one'] <= df['three']), 
    df['one'] < df['two']]

choices = [df['one'], df['two']]

df['que'] = np.select(conditions, choices, default=np.nan)

yields

产量

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03   70
2   8    5     0  NaN

If we can assume that df['one'] >= df['two']when df['one'] < df['two']is False, then the conditions and choices could be simplified to

如果我们可以假设df['one'] >= df['two']whendf['one'] < df['two']为 False，那么条件和选择可以简化为

conditions = [
    df['one'] < df['two'],
    df['one'] <= df['three']]

choices = [df['two'], df['one']]

(The assumption may not be true if df['one']or df['two']contain NaNs.)

（如果df['one']或df['two']包含 NaN ，假设可能不成立。）

Note that

注意

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

defines a DataFrame with string values. Since they look numeric, you might be better off converting those strings to floats:

定义一个带有字符串值的 DataFrame。由于它们看起来是数字，因此最好将这些字符串转换为浮点数：

df2 = df.astype(float)

This changes the results, however, since strings compare character-by-character, while floats are compared numerically.

然而，这会改变结果，因为字符串是逐字符比较的，而浮点数是数字比较的。

In [61]: '10' <= '4.2'
Out[61]: True

In [62]: 10 <= 4.2
Out[62]: False

Answer 2

回答by Bob Haffner

You could use apply() and do something like this

你可以使用 apply() 并做这样的事情

df['que'] = df.apply(lambda x : x['one'] if x['one'] >= x['two'] and x['one'] <= x['three'] else "", axis=1)

or if you prefer not to use a lambda

或者如果您不想使用 lambda

def que(x):
    if x['one'] >= x['two'] and x['one'] <= x['three']:
        return x['one']
    else:
        ''
df['que'] = df.apply(que, axis=1)

Answer 3

回答by Marius

Wrap each individual condition in parentheses, and then use the &operator to combine the conditions:

将每个单独的条件括在括号中，然后使用&运算符组合条件：

df.loc[(df['one'] >= df['two']) & (df['one'] <= df['three']), 'que'] = df['one']

You can fill the non-matching rows by just using ~(the "not" operator) to invert the match:

您可以通过仅使用~（“not”运算符）来反转匹配来填充不匹配的行：

df.loc[~ ((df['one'] >= df['two']) & (df['one'] <= df['three'])), 'que'] = ''

You need to use &and ~rather than andand notbecause the &and ~operators work element-by-element.

您需要使用&and~而不是andandnot因为&and~运算符逐个元素地工作。

The final result:

最终结果：

df
Out[8]: 
  one  two three que
0  10  1.2   4.2  10
1  15   70  0.03    
2   8    5     0

Answer 4

回答by Alex Riley

One way is to use a Boolean series to index the column df['one']. This gives you a new column where the Trueentries have the same value as the same row as df['one']and the Falsevalues are NaN.

一种方法是使用布尔系列来索引列df['one']。这为您提供了一个新列，其中True条目与同一行具有相同的值，df['one']并且False值为NaN。

The Boolean series is just given by your ifstatement (although it is necessary to use &instead of and):

布尔系列仅由您的if语句给出（尽管必须使用&代替and）：

>>> df['que'] = df['one'][(df['one'] >= df['two']) & (df['one'] <= df['three'])]
>>> df
    one two three   que
0   10  1.2 4.2      10
1   15  70  0.03    NaN
2   8   5   0       NaN

If you want the NaNvalues to be replaced by other values, you can use the fillnamethod on the new column que. I've used 0instead of the empty string here:

如果您希望这些NaN值被其他值替换，您可以fillna在新列上使用该方法que。我在这里使用0而不是空字符串：

>>> df['que'] = df['que'].fillna(0)
>>> df
    one two three   que
0   10  1.2   4.2    10
1   15   70  0.03     0
2    8    5     0     0

Answer 5

回答by ccook5760

You can use .equalsfor columns or entire dataframes.

您可以.equals用于列或整个数据框。

df['col1'].equals(df['col2'])

If they're equal, that statement will return True, else False.

如果它们相等，则该语句将返回True, else False。

Answer 6

回答by Nic Scozzaro

I think the closest to the OP's intuition is an inline if statement:

我认为最接近 OP 直觉的是内联 if 语句：

df['que'] = (df['one'] if ((df['one'] >= df['two']) and (df['one'] <= df['three']))

Answer 7

回答by psn1997

Use np.selectif you have multiple conditions to be checked from the dataframe and output a specific choice in a different column

使用np.select，如果你必须从数据帧和输出特定的选择在不同的列中选中多个条件

conditions=[(condition1),(condition2)]
choices=["choice1","chocie2"]

df["new column"]=np.select=(condtion,choice,default=)

Note: No of conditions and no of choices should match, repeat text in choice if for two different conditions you have same choices

注意：没有条件和没有选择应该匹配，如果对于两个不同的条件你有相同的选择，请重复选择中的文本

Python 使用熊猫比较两列

提问by Merlin

采纳答案by unutbu

回答by Bob Haffner

回答by Marius

回答by Alex Riley

回答by ccook5760

回答by Nic Scozzaro

回答by psn1997

相关推荐

最近更新

标签

Python 使用熊猫比较两列

提问by Merlin

采纳答案by unutbu

回答by Bob Haffner

回答by Marius

回答by Alex Riley

回答by ccook5760

回答by Nic Scozzaro

回答by psn1997

相关推荐

Python 类型错误：“zip”对象不可下标

Python 在一个范围内生成'n'个唯一的随机数

python QLineEdit 文本颜色

Python django-rest-framework 3.0 在嵌套序列化程序中创建或更新

相关推荐

最近更新

标签