Python正则表达式获取所有内容,直到字符串中的第一个点

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19142042/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:59:17  来源:igfitidea点击:

Python regex to get everything until the first dot in a string

pythonregex

提问by ealeon

find = re.compile("^(.*)\..*")
for l in lines:
    m = re.match(find, l)
    print m.group(1) 

I want to regex whatever in a string until the first dot.

我想对字符串中的任何内容进行正则表达式,直到第一个点。

in [email protected], I want a@b
in [email protected], I want a@b
in [email protected], I want a@b

[email protected],我想a@b
[email protected],我想a@b
[email protected],我想a@b

What my code is giving me...

我的代码给了我什么......

what should find be so that it only gets a@b?

应该找到什么才能只得到 a@b?

采纳答案by Rohit Jain

By default all the quantifiers are greedy in nature. In the sense, they will try to consume as much string as they can. You can make them reluctant by appending a ?after them:

默认情况下,所有量词本质上都是贪婪的。从某种意义上说,他们将尝试尽可能多地消耗字符串。您可以通过?在它们后面附加 a 使它们不情愿:

find = re.compile(r"^(.*?)\..*")

As noted in comment, this approach would fail if there is no periodin your string. So, it depends upon how you want it to behave. But if you want to get the complete string in that case, then you can use a negated character class:

如评论中所述,如果您的字符串中没有句点,则此方法将失败。因此,这取决于您希望它的行为方式。但是如果你想在这种情况下获得完整的字符串,那么你可以使用否定字符类:

find = re.compile(r"^([^.]*).*")

it will automatically stop after encountering the first period, or at the end of the string.

它会在遇到第一个句点后自动停止,或者在字符串的末尾。



Also you don't want to use re.match()there. re.search()should be just fine. You can modify your code to:

你也不想使用re. match()那里。关于。search()应该没问题。您可以将代码修改为:

find = re.compile(r"^[^.]*")

for l in lines:
    print re.search(find, l).group(0)

Demo on ideone

在ideone上演示

回答by Jerry

You can use .find()instead of regex in this situation:

.find()在这种情况下,您可以使用代替正则表达式:

>>> s = "[email protected]"
>>> print(s[0:s.find('.')])
a@b


Considering the comments, here's some modification using .index()(it's similar to .find()except that it returns an error when there's no matched string instead of -1):

考虑到评论,这里有一些修改使用.index()(它类似于,.find()除了在没有匹配的字符串而不是 -1 时返回错误):

>>> s = "[email protected]"
>>> try:
...     index = s.index('.')
... except ValueError:
...     index = len(s)
...
>>> print(s[:index])
a@b

回答by kindall

I recommend partitionor splitin this case; they work well when there is no dot.

我建议partitionsplit在这种情况下;当没有点时,它们运行良好。

text = "[email protected]"

print text.partition(".")[0]
print text.split(".", 1)[0]

回答by Escualo

You can use the splitmethod: split the string at the .character one time, and you will get a tuple of (before the first period, after the first period). The notation would be:

您可以使用以下split方法:在字符处拆分字符串.一次,您将得到一个元组(在第一个句点之前,在第一个句点之后)。符号是:

mystring.split(".", 1)

Then you can simply create a generator that "yields" the part you are interested, and ignores the one you are not (the _notation). It works as follows:

然后,您可以简单地创建一个生成器来“产生”您感兴趣的部分,并忽略您不感兴趣的部分(_符号)。它的工作原理如下:

entries = [
    "[email protected]",
    "[email protected]",
    "[email protected]",
    ]

for token, _ in (entry.split(".", 1) for entry in entries):
    print token

Output:

输出:

a@b
a@b
a@b

The documentation for the splitmethod can be found online:

split方法的文档可以在网上找到:

str.split([sep[, maxsplit]])

Return a list of the words in the string, using sepas the delimiter string. If maxsplitis given, at most maxsplitsplits are done (thus, the list will have at most maxsplit+1elements). If maxsplitis not specified or -1, then there is no limit on the number of splits (all possible splits are made).

str.split([sep[, maxsplit]])

返回字符串中单词的列表,sep用作分隔符字符串。如果maxsplit给出,则最多进行 maxsplit拆分(因此,列表最多maxsplit+1包含元素)。如果maxsplit未指定或 -1,则分割次数没有限制(进行所有可能的分割)。

回答by Srinivasreddy Jakkireddy

import re
data='[email protected]'
re.sub('\..*','',data)