Python 使用熊猫比较两列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27474921/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Compare two columns using pandas
提问by Merlin
Using this as a starting point:
以此为起点:
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
Out[8]:
one two three
0 10 1.2 4.2
1 15 70 0.03
2 8 5 0
I want to use something like an ifstatement within pandas.
我想if在熊猫中使用类似语句的东西。
if df['one'] >= df['two'] and df['one'] <= df['three']:
df['que'] = df['one']
Basically, check each row via the ifstatement, create new column.
基本上,通过if语句检查每一行,创建新列。
The docs say to use .allbut there is no example...
文档说要使用,.all但没有例子......
采纳答案by unutbu
You could use np.where. If condis a boolean array, and Aand Bare arrays, then
你可以使用np.where。如果cond是布尔数组,并且A和B是数组,则
C = np.where(cond, A, B)
defines C to be equal to Awhere condis True, and Bwhere condis False.
定义 C 等于Awherecond为 True,Bwherecond为 False。
import numpy as np
import pandas as pd
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
, df['one'], np.nan)
yields
产量
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 NaN
2 8 5 0 NaN
If you have more than one condition, then you could use np.selectinstead.
For example, if you wish df['que']to equal df['two']when df['one'] < df['two'], then
如果您有多个条件,那么您可以使用np.select代替。例如,如果您希望df['que']等于df['two']when df['one'] < df['two'],则
conditions = [
(df['one'] >= df['two']) & (df['one'] <= df['three']),
df['one'] < df['two']]
choices = [df['one'], df['two']]
df['que'] = np.select(conditions, choices, default=np.nan)
yields
产量
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 70
2 8 5 0 NaN
If we can assume that df['one'] >= df['two']when df['one'] < df['two']is
False, then the conditions and choices could be simplified to
如果我们可以假设df['one'] >= df['two']whendf['one'] < df['two']为 False,那么条件和选择可以简化为
conditions = [
df['one'] < df['two'],
df['one'] <= df['three']]
choices = [df['two'], df['one']]
(The assumption may not be true if df['one']or df['two']contain NaNs.)
(如果df['one']或df['two']包含 NaN ,假设可能不成立。)
Note that
注意
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
defines a DataFrame with string values. Since they look numeric, you might be better off converting those strings to floats:
定义一个带有字符串值的 DataFrame。由于它们看起来是数字,因此最好将这些字符串转换为浮点数:
df2 = df.astype(float)
This changes the results, however, since strings compare character-by-character, while floats are compared numerically.
然而,这会改变结果,因为字符串是逐字符比较的,而浮点数是数字比较的。
In [61]: '10' <= '4.2'
Out[61]: True
In [62]: 10 <= 4.2
Out[62]: False
回答by Bob Haffner
You could use apply() and do something like this
你可以使用 apply() 并做这样的事情
df['que'] = df.apply(lambda x : x['one'] if x['one'] >= x['two'] and x['one'] <= x['three'] else "", axis=1)
or if you prefer not to use a lambda
或者如果您不想使用 lambda
def que(x):
if x['one'] >= x['two'] and x['one'] <= x['three']:
return x['one']
else:
''
df['que'] = df.apply(que, axis=1)
回答by Marius
Wrap each individual condition in parentheses, and then use the &operator to combine the conditions:
将每个单独的条件括在括号中,然后使用&运算符组合条件:
df.loc[(df['one'] >= df['two']) & (df['one'] <= df['three']), 'que'] = df['one']
You can fill the non-matching rows by just using ~(the "not" operator) to invert the match:
您可以通过仅使用~(“not”运算符)来反转匹配来填充不匹配的行:
df.loc[~ ((df['one'] >= df['two']) & (df['one'] <= df['three'])), 'que'] = ''
You need to use &and ~rather than andand notbecause the &and ~operators work element-by-element.
您需要使用&and~而不是andandnot因为&and~运算符逐个元素地工作。
The final result:
最终结果:
df
Out[8]:
one two three que
0 10 1.2 4.2 10
1 15 70 0.03
2 8 5 0
回答by Alex Riley
One way is to use a Boolean series to index the column df['one']. This gives you a new column where the Trueentries have the same value as the same row as df['one']and the Falsevalues are NaN.
一种方法是使用布尔系列来索引列df['one']。这为您提供了一个新列,其中True条目与同一行具有相同的值,df['one']并且False值为NaN。
The Boolean series is just given by your ifstatement (although it is necessary to use &instead of and):
布尔系列仅由您的if语句给出(尽管必须使用&代替and):
>>> df['que'] = df['one'][(df['one'] >= df['two']) & (df['one'] <= df['three'])]
>>> df
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 NaN
2 8 5 0 NaN
If you want the NaNvalues to be replaced by other values, you can use the fillnamethod on the new column que. I've used 0instead of the empty string here:
如果您希望这些NaN值被其他值替换,您可以fillna在新列上使用该方法que。我在这里使用0而不是空字符串:
>>> df['que'] = df['que'].fillna(0)
>>> df
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 0
2 8 5 0 0
回答by ccook5760
You can use .equalsfor columns or entire dataframes.
您可以.equals用于列或整个数据框。
df['col1'].equals(df['col2'])
If they're equal, that statement will return True, else False.
如果它们相等,则该语句将返回True, else False。
回答by Nic Scozzaro
I think the closest to the OP's intuition is an inline if statement:
我认为最接近 OP 直觉的是内联 if 语句:
df['que'] = (df['one'] if ((df['one'] >= df['two']) and (df['one'] <= df['three']))
回答by psn1997
Use np.selectif you have multiple conditions to be checked from the dataframe and output a specific choice in a different column
使用np.select,如果你必须从数据帧和输出特定的选择在不同的列中选中多个条件
conditions=[(condition1),(condition2)]
choices=["choice1","chocie2"]
df["new column"]=np.select=(condtion,choice,default=)
Note: No of conditions and no of choices should match, repeat text in choice if for two different conditions you have same choices
注意:没有条件和没有选择应该匹配,如果对于两个不同的条件你有相同的选择,请重复选择中的文本

