Python 替换字符串中多个字符的最佳方法？

Question

提问by prosseek

I need to replace some characters as follows: &? \&, #? \#, ...

我需要按如下方式替换一些字符：&? \&, #? \#, ...

I coded as follows, but I guess there should be some better way. Any hints?

我编码如下，但我想应该有更好的方法。任何提示？

strs = strs.replace('&', '\&')
strs = strs.replace('#', '\#')
...

Answer 1

采纳答案by Hugo

Replacing two characters

替换两个字符

I timed all the methods in the current answers along with one extra.

我对当前答案中的所有方法以及一个额外的方法进行了计时。

With an input string of abc&def#ghiand replacing & -> \& and # -> \#, the fastest way was to chain together the replacements like this: text.replace('&', '\&').replace('#', '\#').

用的输入字符串abc&def#ghi和更换＆ - > \＆和＃ - > \＃，最快的方法是链在一起这样的置换：text.replace('&', '\&').replace('#', '\#')。

Timings for each function:

每个功能的时间：

a) 1000000 loops, best of 3: 1.47 μs per loop
b) 1000000 loops, best of 3: 1.51 μs per loop
c) 100000 loops, best of 3: 12.3 μs per loop
d) 100000 loops, best of 3: 12 μs per loop
e) 100000 loops, best of 3: 3.27 μs per loop
f) 1000000 loops, best of 3: 0.817 μs per loop
g) 100000 loops, best of 3: 3.64 μs per loop
h) 1000000 loops, best of 3: 0.927 μs per loop
i) 1000000 loops, best of 3: 0.814 μs per loop

a) 1000000 个循环，最好的 3 个：每个循环 1.47 μs
b) 1000000 个循环，最好的 3 个：每个循环 1.51 μs
c) 100000 个循环，最好的 3 个：每个循环 12.3 μs
d) 100000 个循环，最好的 3 个：每个循环 12 μs
e) 100000 个循环，最好的 3 个：每个循环 3.27 μs
f) 1000000 个循环，最好的 3 个：每个循环 0.817 μs
g) 100000 个循环，最好的 3 个：每个循环 3.64 μs
h) 1000000 个循环，最好的 3 个：每个循环 0.927 μs
i) 1000000 个循环，最好的 3 个：每个循环 0.814 μs

Here are the functions:

以下是功能：

def a(text):
    chars = "&#"
    for c in chars:
        text = text.replace(c, "\" + c)


def b(text):
    for ch in ['&','#']:
        if ch in text:
            text = text.replace(ch,"\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\', text)


RX = re.compile('([&#])')
def d(text):
    text = RX.sub(r'\', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\' + c if c in esc_chars else c for c in s])
esc = mk_esc('&#')
def e(text):
    esc(text)


def f(text):
    text = text.replace('&', '\&').replace('#', '\#')


def g(text):
    replacements = {"&": "\&", "#": "\#"}
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('&', r'\&')
    text = text.replace('#', r'\#')


def i(text):
    text = text.replace('&', r'\&').replace('#', r'\#')

Timed like this:

时间是这样的：

python -mtimeit -s"import time_functions" "time_functions.a('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.b('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.c('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.d('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.e('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.f('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.g('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.h('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.i('abc&def#ghi')"

Replacing 17 characters

替换 17 个字符

Here's similar code to do the same but with more characters to escape (\`*_{}>#+-.!$):

这是执行相同操作的类似代码，但要转义更多字符 (\`*_{}>#+-.!$)：

def a(text):
    chars = "\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\" + c)


def b(text):
    for ch in ['\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\', text)


RX = re.compile('([\`*_{}[]()>#+-.!$])')
def d(text):
    text = RX.sub(r'\', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\' + c if c in esc_chars else c for c in s])
esc = mk_esc('\`*_{}[]()>#+-.!$')
def e(text):
    esc(text)


def f(text):
    text = text.replace('\', '\\').replace('`', '\`').replace('*', '\*').replace('_', '\_').replace('{', '\{').replace('}', '\}').replace('[', '\[').replace(']', '\]').replace('(', '\(').replace(')', '\)').replace('>', '\>').replace('#', '\#').replace('+', '\+').replace('-', '\-').replace('.', '\.').replace('!', '\!').replace('$', '$')


def g(text):
    replacements = {
        "\": "\\",
        "`": "\`",
        "*": "\*",
        "_": "\_",
        "{": "\{",
        "}": "\}",
        "[": "\[",
        "]": "\]",
        "(": "\(",
        ")": "\)",
        ">": "\>",
        "#": "\#",
        "+": "\+",
        "-": "\-",
        ".": "\.",
        "!": "\!",
        "$": "$",
    }
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('\', r'\')
    text = text.replace('`', r'\`')
    text = text.replace('*', r'\*')
    text = text.replace('_', r'\_')
    text = text.replace('{', r'\{')
    text = text.replace('}', r'\}')
    text = text.replace('[', r'\[')
    text = text.replace(']', r'\]')
    text = text.replace('(', r'\(')
    text = text.replace(')', r'\)')
    text = text.replace('>', r'\>')
    text = text.replace('#', r'\#')
    text = text.replace('+', r'\+')
    text = text.replace('-', r'\-')
    text = text.replace('.', r'\.')
    text = text.replace('!', r'\!')
    text = text.replace('$', r'$')


def i(text):
    text = text.replace('\', r'\').replace('`', r'\`').replace('*', r'\*').replace('_', r'\_').replace('{', r'\{').replace('}', r'\}').replace('[', r'\[').replace(']', r'\]').replace('(', r'\(').replace(')', r'\)').replace('>', r'\>').replace('#', r'\#').replace('+', r'\+').replace('-', r'\-').replace('.', r'\.').replace('!', r'\!').replace('$', r'$')

Here's the results for the same input string abc&def#ghi:

这是相同输入字符串的结果abc&def#ghi：

a) 100000 loops, best of 3: 6.72 μs per loop
b) 100000 loops, best of 3: 2.64 μs per loop
c) 100000 loops, best of 3: 11.9 μs per loop
d) 100000 loops, best of 3: 4.92 μs per loop
e) 100000 loops, best of 3: 2.96 μs per loop
f) 100000 loops, best of 3: 4.29 μs per loop
g) 100000 loops, best of 3: 4.68 μs per loop
h) 100000 loops, best of 3: 4.73 μs per loop
i) 100000 loops, best of 3: 4.24 μs per loop

a) 100000 个循环，最好的 3 个：每个循环 6.72 μs
b) 100000 个循环，最好的 3 个：每个循环 2.64 μs
c) 100000 个循环，最好的 3 个：每个循环 11.9 μs
d) 100000 个循环，最好的 3 个：每个循环 4.92 μs
e) 100000 个循环，最好的 3 个：每个循环 2.96 μs
f) 100000 个循环，最好的 3 个：每个循环 4.29 μs
g) 100000 个循环，最好的 3 个：每个循环 4.68 μs
h) 100000 个循环，最好的 3 个：每个循环 4.73 μs
i) 100000 个循环，最好的 3 个：每个循环 4.24 μs

And with a longer input string (## *Something* and [another] thing in a longer sentence with {more} things to replace$):

并使用更长的输入字符串 ( ## *Something* and [another] thing in a longer sentence with {more} things to replace$)：

a) 100000 loops, best of 3: 7.59 μs per loop
b) 100000 loops, best of 3: 6.54 μs per loop
c) 100000 loops, best of 3: 16.9 μs per loop
d) 100000 loops, best of 3: 7.29 μs per loop
e) 100000 loops, best of 3: 12.2 μs per loop
f) 100000 loops, best of 3: 5.38 μs per loop
g) 10000 loops, best of 3: 21.7 μs per loop
h) 100000 loops, best of 3: 5.7 μs per loop
i) 100000 loops, best of 3: 5.13 μs per loop

a) 100000 个循环，最好的 3 个：每个循环 7.59 μs
b) 100000 个循环，最好的 3 个：每个循环 6.54 μs
c) 100000 个循环，最好的 3 个：每个循环 16.9 μs
d) 100000 个循环，最好的 3 个：每个循环 7.29 μs
e) 100000 个循环，最好的 3 个：每个循环 12.2 μs
f) 100000 个循环，最好的 3 个：每个循环 5.38 μs
g) 10000 个循环，最好的 3 个：每个循环 21.7 μs
h) 100000 个循环，最好的 3 个：每个循环 5.7 μs
i) 100000 个循环，最好的 3 个：每个循环 5.13 μs

Adding a couple of variants:

添加几个变体：

def ab(text):
    for ch in ['\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        text = text.replace(ch,"\"+ch)


def ba(text):
    chars = "\`*_{}[]()>#+-.!$"
    for c in chars:
        if c in text:
            text = text.replace(c, "\" + c)

With the shorter input:

使用较短的输入：

ab) 100000 loops, best of 3: 7.05 μs per loop
ba) 100000 loops, best of 3: 2.4 μs per loop

ab) 100000 个循环，最好的 3 个：每个循环 7.05 μs
ba) 100000 个循环，最好的 3 个：每个循环 2.4 μs

With the longer input:

使用更长的输入：

ab) 100000 loops, best of 3: 7.71 μs per loop
ba) 100000 loops, best of 3: 6.08 μs per loop

ab) 100000 个循环，最好的 3 个：每个循环 7.71 μs
ba) 100000 个循环，最好的 3 个：每个循环 6.08 μs

So I'm going to use bafor readability and speed.

所以我将使用ba可读性和速度。

Addendum

附录

Prompted by haccks in the comments, one difference between aband bais the if c in text:check. Let's test them against two more variants:

在评论中的 hackks 提示下，ab和之间的一个区别ba是if c in text:检查。让我们针对另外两个变体测试它们：

def ab_with_check(text):
    for ch in ['\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\"+ch)

def ba_without_check(text):
    chars = "\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\" + c)

Times in μs per loop on Python 2.7.14 and 3.6.3, and on a different machine from the earlier set, so cannot be compared directly.

在 Python 2.7.14 和 3.6.3 上以及在与早期集合不同的机器上，每个循环的时间以 μs 为单位，因此无法直接比较。

╭────────────╥──────┬───────────────┬──────┬──────────────────╮
│ Py, input  ║  ab  │ ab_with_check │  ba  │ ba_without_check │
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
│ Py2, short ║ 8.81 │    4.22       │ 3.45 │    8.01          │
│ Py3, short ║ 5.54 │    1.34       │ 1.46 │    5.34          │
├────────────╫──────┼───────────────┼──────┼──────────────────┤
│ Py2, long  ║ 9.3  │    7.15       │ 6.85 │    8.55          │
│ Py3, long  ║ 7.43 │    4.38       │ 4.41 │    7.02          │
└────────────╨──────┴───────────────┴──────┴──────────────────┘

We can conclude that:

我们可以得出结论：

Those with the check are up to 4x faster than those without the check
ab_with_checkis slightly in the lead on Python 3, but ba(with check) has a greater lead on Python 2
However, the biggest lesson here is Python 3 is up to 3x faster than Python 2! There's not a huge difference between the slowest on Python 3 and fastest on Python 2!

有支票的人比没有支票的人快 4 倍
ab_with_check在 Python 3 上略微领先，但ba（通过检查）在 Python 2 上有更大的领先优势
然而，这里最大的教训是Python 3 比 Python 2 快 3 倍！Python 3 上最慢的和 Python 2 上最快的并没有太大区别！

Answer 2

回答by kennytm

Are you always going to prepend a backslash? If so, try

你总是要在前面加上反斜杠吗？如果是这样，请尝试

import re
rx = re.compile('([&#])')
#                  ^^ fill in the characters here.
strs = rx.sub('\\\1', strs)

It may not be the most efficient method but I think it is the easiest.

这可能不是最有效的方法，但我认为它是最简单的。

Answer 3

回答by ghostdog74

>>> string="abc&def#ghi"
>>> for ch in ['&','#']:
...   if ch in string:
...      string=string.replace(ch,"\"+ch)
...
>>> print string
abc\&def\#ghi

Answer 4

回答by jonesy

>>> a = '&#'
>>> print a.replace('&', r'\&')
\&#
>>> print a.replace('#', r'\#')
&\#
>>>

You want to use a 'raw' string (denoted by the 'r' prefixing the replacement string), since raw strings to not treat the backslash specially.

您想使用“原始”字符串（由替换字符串的前缀“r”表示），因为原始字符串不会特别对待反斜杠。

Answer 5

回答by Victor Olex

You may consider writing a generic escape function:

您可以考虑编写一个通用的转义函数：

def mk_esc(esc_chars):
    return lambda s: ''.join(['\' + c if c in esc_chars else c for c in s])

>>> esc = mk_esc('&#')
>>> print esc('Learn & be #1')
Learn \& be \#1

This way you can make your function configurable with a list of character that should be escaped.

通过这种方式，您可以使用应该转义的字符列表来配置您的函数。

Answer 6

回答by thefourtheye

Simply chain the replacefunctions like this

简单地链接这样的replace功能

strs = "abc&def#ghi"
print strs.replace('&', '\&').replace('#', '\#')
# abc\&def\#ghi

If the replacements are going to be more in number, you can do this in this generic way

如果替换的数量会更多，您可以通过这种通用方式执行此操作

strs, replacements = "abc&def#ghi", {"&": "\&", "#": "\#"}
print "".join([replacements.get(c, c) for c in strs])
# abc\&def\#ghi

Answer 7

回答by parity3

FYI, this is of little or no use to the OP but it may be of use to other readers (please do not downvote, I'm aware of this).

仅供参考，这对 OP 几乎没有用或没有用，但它可能对其他读者有用（请不要投票，我知道这一点）。

As a somewhat ridiculous but interesting exercise, wanted to see if I could use python functional programming to replace multiple chars. I'm pretty sure this does NOT beat just calling replace() twice. And if performance was an issue, you could easily beat this in rust, C, julia, perl, java, javascript and maybe even awk. It uses an external 'helpers' package called pytoolz, accelerated via cython (cytoolz, it's a pypi package).

作为一个有点荒谬但有趣的练习，想看看我是否可以使用 python 函数式编程来替换多个字符。我很确定这不会胜过只调用 replace() 两次。如果性能是一个问题，你可以很容易地在 rust、C、julia、perl、java、javascript 甚至 awk 中击败它。它使用一个名为pytoolz的外部“助手”包，通过 cython 加速（cytoolz，它是一个 pypi 包）。

from cytoolz.functoolz import compose
from cytoolz.itertoolz import chain,sliding_window
from itertools import starmap,imap,ifilter
from operator import itemgetter,contains
text='&hello#hi&yo&'
char_index_iter=compose(partial(imap, itemgetter(0)), partial(ifilter, compose(partial(contains, '#&'), itemgetter(1))), enumerate)
print '\'.join(imap(text.__getitem__, starmap(slice, sliding_window(2, chain((0,), char_index_iter(text), (len(text),))))))

I'm not even going to explain this because no one would bother using this to accomplish multiple replace. Nevertheless, I felt somewhat accomplished in doing this and thought it might inspire other readers or win a code obfuscation contest.

我什至不打算解释这一点，因为没有人会费心使用它来完成多次替换。尽管如此，我觉得这样做有些成就，并认为这可能会激励其他读者或赢得代码混淆比赛。

Answer 8

回答by CasualCoder3

Using reduce which is available in python2.7 and python3.* you can easily replace mutiple substrings in a clean and pythonic way.

使用在 python2.7 和 python3.* 中可用的reduce，您可以轻松地以干净和pythonic 的方式替换多个子字符串。

# Lets define a helper method to make it easy to use
def replacer(text, replacements):
    return reduce(
        lambda text, ptuple: text.replace(ptuple[0], ptuple[1]), 
        replacements, text
    )

if __name__ == '__main__':
    uncleaned_str = "abc&def#ghi"
    cleaned_str = replacer(uncleaned_str, [("&","\&"),("#","\#")])
    print(cleaned_str) # "abc\&def\#ghi"

In python2.7 you don't have to import reduce but in python3.* you have to import it from the functools module.

在 python2.7 中，您不必导入 reduce，但在 python3.* 中，您必须从 functools 模块中导入它。

Answer 9

回答by tommy.carstensen

Here is a python3 method using str.translateand str.maketrans:

这是一个使用str.translateand的python3方法str.maketrans：

s = "abc&def#ghi"
print(s.translate(str.maketrans({'&': '\&', '#': '\#'})))

The printed string is abc\&def\#ghi.

打印的字符串是abc\&def\#ghi.

Answer 10

回答by Sebastialonso

Late to the party, but I lost a lot of time with this issue until I found my answer.

聚会迟到了，但在找到答案之前，我在这个问题上浪费了很多时间。

Short and sweet, translateis superior to replace. If you're more interested in funcionality over time optimization, do not use replace.

短而甜，translate优于replace。如果您对随时间优化的功能更感兴趣，请不要使用replace.

Also use translateif you don't know if the set of characters to be replaced overlaps the set of characters used to replace.

还可以使用translate，如果你不知道，如果重叠设置的用于替换的字符将被替换的字符集。

Case in point:

案例：

Using replaceyou would naively expect the snippet "1234".replace("1", "2").replace("2", "3").replace("3", "4")to return "2344", but it will return in fact "4444".

使用replace您会天真地期望代码段"1234".replace("1", "2").replace("2", "3").replace("3", "4")返回"2344"，但实际上它会返回"4444"。

Translation seems to perform what OP originally desired.

翻译似乎执行了 OP 最初想要的。

Python 替换字符串中多个字符的最佳方法？

提问by prosseek

采纳答案by Hugo

Replacing two characters

替换两个字符

Replacing 17 characters

替换 17 个字符

Addendum

附录

回答by kennytm

回答by ghostdog74

回答by jonesy

回答by Victor Olex

回答by thefourtheye

回答by parity3

回答by CasualCoder3

回答by tommy.carstensen

回答by Sebastialonso

相关推荐

最近更新

标签

Python 替换字符串中多个字符的最佳方法？

提问by prosseek

采纳答案by Hugo

Replacing two characters

替换两个字符

Replacing 17 characters

替换 17 个字符

Addendum

附录

回答by kennytm

回答by ghostdog74

回答by jonesy

回答by Victor Olex

回答by thefourtheye

回答by parity3

回答by CasualCoder3

回答by tommy.carstensen

回答by Sebastialonso

相关推荐

Python 如何在不明确列出列的情况下从 Pandas DataFrame 中选择具有一个或多个空值的行？

Python 只得到 1 个小数位

如何在 Python 中更改按钮大小？

Python 如何获取包含默认时区的 isoformat 日期时间字符串？

相关推荐

最近更新

标签