Python正则表达式获取所有内容,直到字符串中的第一个点
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19142042/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python regex to get everything until the first dot in a string
提问by ealeon
find = re.compile("^(.*)\..*")
for l in lines:
m = re.match(find, l)
print m.group(1)
I want to regex whatever in a string until the first dot.
我想对字符串中的任何内容进行正则表达式,直到第一个点。
in [email protected]
, I want a@b
in [email protected]
, I want a@b
in [email protected]
, I want a@b
在[email protected]
,我想a@b
在[email protected]
,我想a@b
在[email protected]
,我想a@b
What my code is giving me...
我的代码给了我什么......
[email protected]
printsa@b
[email protected]
prints[email protected]
[email protected]
prints[email protected]
what should find be so that it only gets a@b?
应该找到什么才能只得到 a@b?
采纳答案by Rohit Jain
By default all the quantifiers are greedy in nature. In the sense, they will try to consume as much string as they can. You can make them reluctant by appending a ?
after them:
默认情况下,所有量词本质上都是贪婪的。从某种意义上说,他们将尝试尽可能多地消耗字符串。您可以通过?
在它们后面附加 a 使它们不情愿:
find = re.compile(r"^(.*?)\..*")
As noted in comment, this approach would fail if there is no periodin your string. So, it depends upon how you want it to behave. But if you want to get the complete string in that case, then you can use a negated character class:
如评论中所述,如果您的字符串中没有句点,则此方法将失败。因此,这取决于您希望它的行为方式。但是如果你想在这种情况下获得完整的字符串,那么你可以使用否定字符类:
find = re.compile(r"^([^.]*).*")
it will automatically stop after encountering the first period, or at the end of the string.
它会在遇到第一个句点后自动停止,或者在字符串的末尾。
Also you don't want to use re.match()
there. re.search()
should be just fine. You can modify your code to:
你也不想使用re. match()
那里。关于。search()
应该没问题。您可以将代码修改为:
find = re.compile(r"^[^.]*")
for l in lines:
print re.search(find, l).group(0)
回答by Jerry
You can use .find()
instead of regex in this situation:
.find()
在这种情况下,您可以使用代替正则表达式:
>>> s = "[email protected]"
>>> print(s[0:s.find('.')])
a@b
Considering the comments, here's some modification using .index()
(it's similar to .find()
except that it returns an error when there's no matched string instead of -1):
考虑到评论,这里有一些修改使用.index()
(它类似于,.find()
除了在没有匹配的字符串而不是 -1 时返回错误):
>>> s = "[email protected]"
>>> try:
... index = s.index('.')
... except ValueError:
... index = len(s)
...
>>> print(s[:index])
a@b
回答by kindall
I recommend partition
or split
in this case; they work well when there is no dot.
我建议partition
或split
在这种情况下;当没有点时,它们运行良好。
text = "[email protected]"
print text.partition(".")[0]
print text.split(".", 1)[0]
回答by Escualo
You can use the split
method: split the string at the .
character one time, and you will get a tuple of (before the first period, after the first period). The notation would be:
您可以使用以下split
方法:在字符处拆分字符串.
一次,您将得到一个元组(在第一个句点之前,在第一个句点之后)。符号是:
mystring.split(".", 1)
Then you can simply create a generator that "yields" the part you are interested, and ignores the one you are not (the _
notation). It works as follows:
然后,您可以简单地创建一个生成器来“产生”您感兴趣的部分,并忽略您不感兴趣的部分(_
符号)。它的工作原理如下:
entries = [
"[email protected]",
"[email protected]",
"[email protected]",
]
for token, _ in (entry.split(".", 1) for entry in entries):
print token
Output:
输出:
a@b
a@b
a@b
The documentation for the split
method can be found online:
该split
方法的文档可以在网上找到:
str.split([sep[, maxsplit]])
Return a list of the words in the string, using
sep
as the delimiter string. Ifmaxsplit
is given, at mostmaxsplit
splits are done (thus, the list will have at mostmaxsplit+1
elements). Ifmaxsplit
is not specified or -1, then there is no limit on the number of splits (all possible splits are made).
str.split([sep[, maxsplit]])
返回字符串中单词的列表,
sep
用作分隔符字符串。如果maxsplit
给出,则最多进行maxsplit
拆分(因此,列表最多maxsplit+1
包含元素)。如果maxsplit
未指定或 -1,则分割次数没有限制(进行所有可能的分割)。
回答by Srinivasreddy Jakkireddy
import re
data='[email protected]'
re.sub('\..*','',data)