使用条件语句替换 Pandas DataFrame 中的条目
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28975758/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replace an entry in a pandas DataFrame using a conditional statement
提问by James Eaves
I'd like to change the value of an entry in a Dataframe given a condition. For instance:
我想在给定条件的情况下更改 Dataframe 中条目的值。例如:
d = pandas.read_csv('output.az.txt', names = varname)
d['uld'] = (d.trade - d.plg25)*(d.final - d.price25)
if d['uld'] > 0:
d['uld'] = 1
else:
d['uld'] = 0
I'm not understanding why the above doesn't work. Thank you for your help.
我不明白为什么上述不起作用。感谢您的帮助。
回答by EdChum
Use np.whereto set your data based on a simple boolean criteria:
用于np.where根据简单的布尔条件设置数据:
In [3]:
df = pd.DataFrame({'uld':np.random.randn(10)})
df
Out[3]:
uld
0 0.939662
1 -0.009132
2 -0.209096
3 -0.502926
4 0.587249
5 0.375806
6 -0.140995
7 0.002854
8 -0.875326
9 0.148876
In [4]:
df['uld'] = np.where(df['uld'] > 0, 1, 0)
df
Out[4]:
uld
0 1
1 0
2 0
3 0
4 1
5 1
6 0
7 1
8 0
9 1
As for why what you did failed:
至于你做的失败的原因:
In [7]:
if df['uld'] > 0:
df['uld'] = 1
else:
df['uld'] = 0
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-ec7d7aaa1c28> in <module>()
----> 1 if df['uld'] > 0:
2 df['uld'] = 1
3 else:
4 df['uld'] = 0
C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
696 raise ValueError("The truth value of a {0} is ambiguous. "
697 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 698 .format(self.__class__.__name__))
699
700 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So the error is that you are trying to evaluate an array with Trueor Falsewhich becomes ambiguous because there are multiple values to compare hence the error. In this situation you can't really use the recommended any, alletc. as you are wanting to mask your df and only set the values where the condition is met, there is an explanation on the pandas site about this: http://pandas.pydata.org/pandas-docs/dev/gotchas.htmland a related question here: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
所以错误是你试图用True或评估一个数组,False这变得不明确,因为有多个值要比较,因此错误。在这种情况下,你不能真正使用建议any,all等等,你是想掩盖你的DF和只设置在条件满足的价值观,对大Pandas网站关于此问题的解释:HTTP://大Pandas。 pydata.org/pandas-docs/dev/gotchas.html以及此处的相关问题:ValueError: The truth value of an array with more than one element is ambiguous。使用 a.any() 或 a.all()
np.wheretakes a boolean condition as the first param, if that is true it'll return the second param, otherwise if false it returns the third param as you want.
np.where将布尔条件作为第一个参数,如果为真,它将返回第二个参数,否则如果为假,则根据需要返回第三个参数。
UPDATE
更新
Having looked at this again you can convert the boolean Series to an intby casting using astype:
说完看了这个你可以再次布尔系列转换为int使用铸造astype:
In [23]:
df['uld'] = (df['uld'] > 0).astype(int)
df
Out[23]:
uld
0 1
1 0
2 0
3 0
4 1
5 1
6 0
7 1
8 0
9 1

