用正则表达式去除标点符号 - python

Question

提问by user2696287

I need to use regex to strip punctuation at the startand endof a word. It seems like regex would be the best option for this. I don't want punctuation removed from words like 'you're', which is why I'm not using .replace().

我需要使用正则表达式去除单词开头和结尾的标点符号。似乎正则表达式将是最好的选择。我不想从像“you're”这样的词中删除标点符号，这就是我不使用 .replace() 的原因。

Answer 1

采纳答案by falsetru

You don't need regular expression to do this task. Use str.stripwith string.punctuation:

您不需要正则表达式来完成此任务。使用str.strip有string.punctuation：

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\]^_`{|}~'
>>> '!Hello.'.strip(string.punctuation)
'Hello'

>>> ' '.join(word.strip(string.punctuation) for word in "Hello, world. I'm a boy, you're a girl.".split())
"Hello world I'm a boy you're a girl"

Answer 2

回答by rahul ranjan

You can remove punctuation from a text file or a particular string file using regular expression as follows -

您可以使用正则表达式从文本文件或特定字符串文件中删除标点符号，如下所示 -

new_data=[]
with open('/home/rahul/align.txt','r') as f:
    f1 = f.read()
    f2 = f1.split()



    all_words = f2 
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~''' 
    # You can add and remove punctuations as per your choice 
    #removing stop words in hungarian text and  english text and 
    #display the unpunctuated string
    # To remove from a string, replace new_data with new_str 
    # new_str = "My name$#@ is . rahul -~"

    for word in all_words: 
        if word not in punctuations:
           new_data.append(word)

    print (new_data)

P.S. - Do the identation properly as per required. Hope this helps!!

PS - 按要求正确进行识别。希望这可以帮助！！

Answer 3

回答by Shalini Baranwal

I think this function will be helpful and concise in removing punctuation:

我认为此功能在删除标点符号方面会有所帮助且简洁：

import re
def remove_punct(text):
    new_words = []
    for word in text:
        w = re.sub(r'[^\w\s]','',word) #remove everything except words and space#how 
                                        #to remove underscore as well
        w = re.sub(r'\_','',w)
        new_words.append(w)
    return new_words

用正则表达式去除标点符号 - python

提问by user2696287

采纳答案by falsetru

回答by rahul ranjan

回答by Shalini Baranwal

相关推荐

最近更新

标签

用正则表达式去除标点符号 - python

提问by user2696287

采纳答案by falsetru

回答by rahul ranjan

回答by Shalini Baranwal

相关推荐

Python 使用 h5py 删除 hdf5 数据集

如何让 Mac OS 使用 Homebrew 安装的 python

如何使用 PySpark 加载 IPython shell

Python 如何计算pandas中一行中所有元素的加权和？

相关推荐

最近更新

标签