pandas 熊猫:Dataframe.replace() 与正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32201222/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas: Dataframe.replace() with regex
提问by Boosted_d16
I have a table which looks like this:
我有一张看起来像这样的表:
df_raw = pd.DataFrame(dict(A = pd.Series(['1.00','-1']), B = pd.Series(['1.0','-45.00','-'])))
A B
0 1.00 1.0
1 -1 -45.00
2 NaN -
I would like to replace '-' to '0.00' using dataframe.replace() but it struggles because of the negative values, '-1', '-45.00'.
我想使用 dataframe.replace() 将“-”替换为“0.00”,但由于负值“-1”、“-45.00”而难以解决。
How can I ignore the negative values and replace only '-' to '0.00' ?
如何忽略负值并仅将 '-' 替换为 '0.00' ?
my code:
我的代码:
df_raw = df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True).astype(np.float64)
error code:
错误代码:
ValueError: invalid literal for float(): 0.0045.00
回答by EdChum
Your regex is matching on all -characters:
您的正则表达式匹配所有-字符:
In [48]:
df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True)
Out[48]:
A B
0 1.00 1.0
1 0.001 0.0045.00
2 NaN 0.00
If you put additional boundaries so that it only matches that single character with a termination then it works as expected:
如果您放置了额外的边界,以便它只匹配具有终止的单个字符,那么它会按预期工作:
In [47]:
df_raw.replace(['^-$'], ['0.00'], regex=True)
Out[47]:
A B
0 1.00 1.0
1 -1 -45.00
2 NaN 0.00
Here ^means start of string and $means end of string so it will only match on that single character.
这里^表示字符串的开头,表示字符串的$结尾,因此它只会匹配该单个字符。
Or you can just use replacewhich will only match on exact matches:
或者你可以只使用replacewhich 只匹配完全匹配:
In [29]:
df_raw.replace('-',0)
Out[29]:
A B
0 1.00 1.0
1 -1 -45.00
2 NaN 0

