Python 熊猫应用正则表达式来替换值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22588316/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas applying regex to replace values
提问by KillerSnail
I have read some pricing data into a pandas dataframe the values appear as:
我已将一些定价数据读入 Pandas 数据框中,这些值显示为:
,000*
000 conditions attached
I want to strip it down to just the numeric values. I know I can loop through and apply regex
我想把它简化为数值。我知道我可以循环并应用正则表达式
[0-9]+
to each field then join the resulting list back together but is there a not loopy way?
到每个字段,然后将结果列表重新连接在一起,但有没有不循环的方式?
Thanks
谢谢
采纳答案by Jerry
You could remove all the non-digits using re.sub():
您可以使用以下方法删除所有非数字re.sub():
value = re.sub(r"[^0-9]+", "", value)
回答by unutbu
You could use Series.str.replace:
你可以使用Series.str.replace:
import pandas as pd
df = pd.DataFrame([',000*','000 conditions attached'], columns=['P'])
print(df)
# P
# 0 ,000*
# 1 000 conditions attached
df['P'] = df['P'].str.replace(r'\D+', '').astype('int')
print(df)
yields
产量
P
0 40000
1 40000
since \Dmatches any non-decimal digit.
因为\D匹配任何非十进制数字。
回答by samthebrand
You don't need regex for this. This should work:
您不需要为此使用正则表达式。这应该有效:
df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)
df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)
回答by Pluto
You could use pandas' replace method; also you may want to keep the thousands separator ',' and the decimal place separator '.'
您可以使用熊猫的替换方法;您也可能希望保留千位分隔符 ',' 和小数位分隔符 '.'
import pandas as pd
df = pd.DataFrame([',000.32*','000 conditions attached'], columns=['pricing'])
df['pricing'].replace(to_replace="$([0-9,\.]+).*", value=r"", regex=True, inplace=True)
print(df)
pricing
0 40,000.32
1 40000

