Python 使用熊猫比较两列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27474921/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Compare two columns using pandas
提问by Merlin
Using this as a starting point:
以此为起点:
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
Out[8]:
one two three
0 10 1.2 4.2
1 15 70 0.03
2 8 5 0
I want to use something like an if
statement within pandas.
我想if
在熊猫中使用类似语句的东西。
if df['one'] >= df['two'] and df['one'] <= df['three']:
df['que'] = df['one']
Basically, check each row via the if
statement, create new column.
基本上,通过if
语句检查每一行,创建新列。
The docs say to use .all
but there is no example...
文档说要使用,.all
但没有例子......
采纳答案by unutbu
You could use np.where. If cond
is a boolean array, and A
and B
are arrays, then
你可以使用np.where。如果cond
是布尔数组,并且A
和B
是数组,则
C = np.where(cond, A, B)
defines C to be equal to A
where cond
is True, and B
where cond
is False.
定义 C 等于A
wherecond
为 True,B
wherecond
为 False。
import numpy as np
import pandas as pd
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
, df['one'], np.nan)
yields
产量
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 NaN
2 8 5 0 NaN
If you have more than one condition, then you could use np.selectinstead.
For example, if you wish df['que']
to equal df['two']
when df['one'] < df['two']
, then
如果您有多个条件,那么您可以使用np.select代替。例如,如果您希望df['que']
等于df['two']
when df['one'] < df['two']
,则
conditions = [
(df['one'] >= df['two']) & (df['one'] <= df['three']),
df['one'] < df['two']]
choices = [df['one'], df['two']]
df['que'] = np.select(conditions, choices, default=np.nan)
yields
产量
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 70
2 8 5 0 NaN
If we can assume that df['one'] >= df['two']
when df['one'] < df['two']
is
False, then the conditions and choices could be simplified to
如果我们可以假设df['one'] >= df['two']
whendf['one'] < df['two']
为 False,那么条件和选择可以简化为
conditions = [
df['one'] < df['two'],
df['one'] <= df['three']]
choices = [df['two'], df['one']]
(The assumption may not be true if df['one']
or df['two']
contain NaNs.)
(如果df['one']
或df['two']
包含 NaN ,假设可能不成立。)
Note that
注意
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
defines a DataFrame with string values. Since they look numeric, you might be better off converting those strings to floats:
定义一个带有字符串值的 DataFrame。由于它们看起来是数字,因此最好将这些字符串转换为浮点数:
df2 = df.astype(float)
This changes the results, however, since strings compare character-by-character, while floats are compared numerically.
然而,这会改变结果,因为字符串是逐字符比较的,而浮点数是数字比较的。
In [61]: '10' <= '4.2'
Out[61]: True
In [62]: 10 <= 4.2
Out[62]: False
回答by Bob Haffner
You could use apply() and do something like this
你可以使用 apply() 并做这样的事情
df['que'] = df.apply(lambda x : x['one'] if x['one'] >= x['two'] and x['one'] <= x['three'] else "", axis=1)
or if you prefer not to use a lambda
或者如果您不想使用 lambda
def que(x):
if x['one'] >= x['two'] and x['one'] <= x['three']:
return x['one']
else:
''
df['que'] = df.apply(que, axis=1)
回答by Marius
Wrap each individual condition in parentheses, and then use the &
operator to combine the conditions:
将每个单独的条件括在括号中,然后使用&
运算符组合条件:
df.loc[(df['one'] >= df['two']) & (df['one'] <= df['three']), 'que'] = df['one']
You can fill the non-matching rows by just using ~
(the "not" operator) to invert the match:
您可以通过仅使用~
(“not”运算符)来反转匹配来填充不匹配的行:
df.loc[~ ((df['one'] >= df['two']) & (df['one'] <= df['three'])), 'que'] = ''
You need to use &
and ~
rather than and
and not
because the &
and ~
operators work element-by-element.
您需要使用&
and~
而不是and
andnot
因为&
and~
运算符逐个元素地工作。
The final result:
最终结果:
df
Out[8]:
one two three que
0 10 1.2 4.2 10
1 15 70 0.03
2 8 5 0
回答by Alex Riley
One way is to use a Boolean series to index the column df['one']
. This gives you a new column where the True
entries have the same value as the same row as df['one']
and the False
values are NaN
.
一种方法是使用布尔系列来索引列df['one']
。这为您提供了一个新列,其中True
条目与同一行具有相同的值,df['one']
并且False
值为NaN
。
The Boolean series is just given by your if
statement (although it is necessary to use &
instead of and
):
布尔系列仅由您的if
语句给出(尽管必须使用&
代替and
):
>>> df['que'] = df['one'][(df['one'] >= df['two']) & (df['one'] <= df['three'])]
>>> df
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 NaN
2 8 5 0 NaN
If you want the NaN
values to be replaced by other values, you can use the fillna
method on the new column que
. I've used 0
instead of the empty string here:
如果您希望这些NaN
值被其他值替换,您可以fillna
在新列上使用该方法que
。我在这里使用0
而不是空字符串:
>>> df['que'] = df['que'].fillna(0)
>>> df
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 0
2 8 5 0 0
回答by ccook5760
You can use .equals
for columns or entire dataframes.
您可以.equals
用于列或整个数据框。
df['col1'].equals(df['col2'])
If they're equal, that statement will return True
, else False
.
如果它们相等,则该语句将返回True
, else False
。
回答by Nic Scozzaro
I think the closest to the OP's intuition is an inline if statement:
我认为最接近 OP 直觉的是内联 if 语句:
df['que'] = (df['one'] if ((df['one'] >= df['two']) and (df['one'] <= df['three']))
回答by psn1997
Use np.select
if you have multiple conditions to be checked from the dataframe and output a specific choice in a different column
使用np.select
,如果你必须从数据帧和输出特定的选择在不同的列中选中多个条件
conditions=[(condition1),(condition2)]
choices=["choice1","chocie2"]
df["new column"]=np.select=(condtion,choice,default=)
Note: No of conditions and no of choices should match, repeat text in choice if for two different conditions you have same choices
注意:没有条件和没有选择应该匹配,如果对于两个不同的条件你有相同的选择,请重复选择中的文本