正则表达式标点分割 [Python]

Question

提问by dantdj

Can anyone help me a bit with regexs? I currently have this: re.split(" +", line.rstrip()), which separates by spaces.

任何人都可以帮助我使用正则表达式吗？我目前有这个: re.split(" +", line.rstrip())，用空格分隔。

How could I expand this to cover punctuation, too?

我该如何扩展它以涵盖标点符号？

Answer 1

采纳答案by Mister_Tom

The official Python documentation has a good example for this one. It will split on all non-alphanumeric characters (whitespace and punctuation). Literally \W is the character class for all Non-Word characters. Note: the underscore "_" is considered a "word" character and will not be part of the split here.

官方 Python 文档对此有一个很好的示例。它将拆分所有非字母数字字符（空格和标点符号）。字面上的 \W 是所有非单词字符的字符类。注意：下划线“_”被认为是一个“单词”字符，不会成为这里拆分的一部分。

re.split('\W+', 'Words, words, words.')

See https://docs.python.org/3/library/re.htmlfor more examples, search page for "re.split"

有关更多示例，请参阅https://docs.python.org/3/library/re.html，搜索页面“re.split”

Answer 2

回答by Ashwini Chaudhary

Using string.punctuationand character class:

使用string.punctuation和字符类：

>>> from string import punctuation
>>> r = re.compile(r'[\s{}]+'.format(re.escape(punctuation)))
>>> r.split('dss!dfs^  #$% jjj^')
['dss', 'dfs', 'jjj', '']

Answer 3

回答by dawg

import re
st='one two,three; four-five,    six'

print re.split(r'\s+|[,;.-]\s*', st)
# ['one', 'two', 'three', 'four', 'five', 'six']

正则表达式标点分割 [Python]

提问by dantdj

采纳答案by Mister_Tom

回答by Ashwini Chaudhary

回答by dawg

相关推荐

最近更新

标签

正则表达式标点分割 [Python]

提问by dantdj

采纳答案by Mister_Tom

回答by Ashwini Chaudhary

回答by dawg

相关推荐

如何腌制或存储 Jupyter (IPython) 笔记本会话以备后用

Python 使用 OpenCV 时找不到模块 cv2

Python 熊猫图不显示

Python 在 Tkinter 中，如何禁用 Entry？

相关推荐

最近更新

标签