Python 熊猫应用正则表达式来替换值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22588316/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:14:13  来源:igfitidea点击:

pandas applying regex to replace values

pythonregexpandas

提问by KillerSnail

I have read some pricing data into a pandas dataframe the values appear as:

我已将一些定价数据读入 Pandas 数据框中,这些值显示为:

,000*
000 conditions attached

I want to strip it down to just the numeric values. I know I can loop through and apply regex

我想把它简化为数值。我知道我可以循环并应用正则表达式

[0-9]+

to each field then join the resulting list back together but is there a not loopy way?

到每个字段,然后将结果列表重新连接在一起,但有没有不循环的方式?

Thanks

谢谢

采纳答案by Jerry

You could remove all the non-digits using re.sub():

您可以使用以下方法删除所有非数字re.sub()

value = re.sub(r"[^0-9]+", "", value)

regex101 demo

regex101 演示

回答by unutbu

You could use Series.str.replace:

你可以使用Series.str.replace

import pandas as pd

df = pd.DataFrame([',000*','000 conditions attached'], columns=['P'])
print(df)
#                             P
# 0                    ,000*
# 1  000 conditions attached

df['P'] = df['P'].str.replace(r'\D+', '').astype('int')
print(df)

yields

产量

       P
0  40000
1  40000

since \Dmatches any non-decimal digit.

因为\D匹配任何非十进制数字

回答by samthebrand

You don't need regex for this. This should work:

您不需要为此使用正则表达式。这应该有效:

df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)

df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)

回答by Pluto

You could use pandas' replace method; also you may want to keep the thousands separator ',' and the decimal place separator '.'

您可以使用熊猫的替换方法;您也可能希望保留千位分隔符 ',' 和小数位分隔符 '.'

import pandas as pd

df = pd.DataFrame([',000.32*','000 conditions attached'], columns=['pricing'])
df['pricing'].replace(to_replace="$([0-9,\.]+).*", value=r"", regex=True, inplace=True)
print(df)
pricing
0  40,000.32
1      40000