Python 用空格替换标点符号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34860982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:38:14  来源:igfitidea点击:

replace the punctuation with whitespace

pythonstringpython-3.x

提问by oceano22

I have a problem with the code and can not figure out how to move forward.

我的代码有问题,无法弄清楚如何继续前进。

tweet = "I am tired! I like fruit...and milk"
clean_words = tweet.translate(None, ",.;@#?!&$")
words = clean_words.split()

print tweet
print words

Output:

输出:

['I', 'am', 'tired', 'I', 'like', 'fruitand', 'milk']

What I would like is to replace the punctuation with white space but do not know what function or cycle use. Can anyone help me please?

我想要的是用空格替换标点符号但不知道使用什么功能或循环。有人可以帮我吗?

回答by Bryan

There are a few ways to approach this problem. I have one that works, but believe it is suboptimal. Hopefully someone who knows regex better will come along and improve the answer or offer a better one.

有几种方法可以解决这个问题。我有一个有效的,但相信它是次优的。希望更了解正则表达式的人会出现并改进答案或提供更好的答案。

Your question is labeled python-3.x, but your code is python 2.x, so my code is 2.x as well. I include a version that works in 3.x.

你的问题被标记为python-3.x,但你的代码是python 2.x,所以我的代码也是2.x。我包括一个适用于 3.x 的版本。

#!/usr/bin/env python

import re

tweet = "I am tired! I like fruit...and milk"
# print tweet

clean_words = tweet.translate(None, ",.;@#?!&$")  # Python 2
# clean_words = tweet.translate(",.;@#?!&$")  # Python 3
print(clean_words)  # Does not handle fruit...and

regex_sub = re.sub(r"[,.;@#?!&$]+", ' ', tweet)  # + means match one or more
print(regex_sub)  # extra space between tired and I

regex_sub = re.sub(r"\s+", ' ', regex_sub)  # Replaces any number of spaces with one space
print(regex_sub)  # looks good

回答by Thomas Baruchel

I am not sure I fully understand your requirements, but did you consider adding only one more line to your current code like:

我不确定我是否完全理解您的要求,但是您是否考虑过在当前代码中再添加一行,例如:

>>> a=['I', 'am', 'tired', 'I', 'like', 'fruitand', 'milk']
>>> " ".join(a)
'I am tired I like fruitand milk'

Is it what you are asking or do you need something more specific? Regards.

这是您要问的还是您需要更具体的东西?问候。

回答by pivanchy

If you're using Python 2.x you could try:

如果您使用的是 Python 2.x,您可以尝试:

import string

tweet = "I am tired! I like fruit...and milk"
clean_words = tweet.translate(string.maketrans("",""), string.punctuation)

print clean_words

For Python 3.x it works:

对于 Python 3.x,它有效:

import string

tweet = "I am tired! I like fruit...and milk"
transtable = str.maketrans('', '', string.punctuation)
clean_words = tweet.translate(transtable)

print(clean_words)

These parts of code removes all the punctuation symbols from string.

这些代码部分从字符串中删除了所有标点符号。

回答by Jonathan

Here is a regex based solution that has been tested under Python 3.5.1. I think it is both simple and succinct.

这是一个基于正则表达式的解决方案,已在 Python 3.5.1 下测试过。我认为它既简单又简洁。

import re

tweet = "I am tired! I like fruit...and milk"
clean = re.sub(r"""
               [,.;@#?!&$]+  # Accept one or more copies of punctuation
               \ *           # plus zero or more copies of a space,
               """,
               " ",          # and replace it with a single space
               tweet, flags=re.VERBOSE)
print(tweet + "\n" + clean)

Results:

结果:

I am tired! I like fruit...and milk
I am tired I like fruit and milk

Compact version:

精简版:

tweet = "I am tired! I like fruit...and milk"
clean = re.sub(r"[,.;@#?!&$]+\ *", " ", tweet)
print(tweet + "\n" + clean)

回答by YuanzhiKe

It is easy to achieve by changing your "maketrans" like this:

通过像这样更改“maketrans”很容易实现:

import string
tweet = "I am tired! I like fruit...and milk"
translator = string.maketrans(string.punctuation, ' '*len(string.punctuation)) #map punctuation to space
print(tweet.translate(translator))

It works on my machine running python 3.5.2 and 2.x. Hope that it works on yours too.

它适用于我运行 python 3.5.2 和 2.x 的机器。希望它也适用于你的。