Python 熊猫应用正则表达式来替换值

Question

提问by KillerSnail

I have read some pricing data into a pandas dataframe the values appear as:

我已将一些定价数据读入 Pandas 数据框中，这些值显示为：

,000*
000 conditions attached

I want to strip it down to just the numeric values. I know I can loop through and apply regex

我想把它简化为数值。我知道我可以循环并应用正则表达式

[0-9]+

to each field then join the resulting list back together but is there a not loopy way?

到每个字段，然后将结果列表重新连接在一起，但有没有不循环的方式？

Thanks

谢谢

Answer 1

采纳答案by Jerry

You could remove all the non-digits using re.sub():

您可以使用以下方法删除所有非数字re.sub()：

value = re.sub(r"[^0-9]+", "", value)

regex101 demo

regex101 演示

Answer 2

回答by unutbu

You could use Series.str.replace:

你可以使用Series.str.replace：

import pandas as pd

df = pd.DataFrame([',000*','000 conditions attached'], columns=['P'])
print(df)
#                             P
# 0                    ,000*
# 1  000 conditions attached

df['P'] = df['P'].str.replace(r'\D+', '').astype('int')
print(df)

yields

产量

       P
0  40000
1  40000

since \Dmatches any non-decimal digit.

因为\D匹配任何非十进制数字。

Answer 3

回答by samthebrand

You don't need regex for this. This should work:

您不需要为此使用正则表达式。这应该有效：

df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)

Answer 4

回答by Pluto

You could use pandas' replace method; also you may want to keep the thousands separator ',' and the decimal place separator '.'

您可以使用熊猫的替换方法；您也可能希望保留千位分隔符 ',' 和小数位分隔符 '.'

import pandas as pd

df = pd.DataFrame([',000.32*','000 conditions attached'], columns=['pricing'])
df['pricing'].replace(to_replace="$([0-9,\.]+).*", value=r"", regex=True, inplace=True)
print(df)
pricing
0  40,000.32
1      40000

Python 熊猫应用正则表达式来替换值

提问by KillerSnail

采纳答案by Jerry

回答by unutbu

回答by samthebrand

回答by Pluto

相关推荐

最近更新

标签

Python 熊猫应用正则表达式来替换值

提问by KillerSnail

采纳答案by Jerry

回答by unutbu

回答by samthebrand

回答by Pluto

相关推荐

Python JSON 架构：验证数字或空值

Python 如何使用 matplotlib/numpy 将数组保存为灰度图像？

python请求文件上传

Python 导入错误：没有名为 sklearn.datasets 的模块

相关推荐

最近更新

标签