Python 产品代码看起来像abcd2343,用字母和数字分割什么

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3340081/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:32:03  来源:igfitidea点击:

Product code looks like abcd2343, what to split by letters and numbers

pythonsplit

提问by Blankman

I have a list of product codes in a text file, on each like is the product code that looks like:

我在一个文本文件中有一个产品代码列表,每个类似的产品代码如下所示:

abcd2343 abw34324 abc3243-23A

abcd2343 abw34324 abc3243-23A

So it is lettersfollowed by numbersand other characters.

所以它是字母后跟数字其他字符

I want to spliton the first occurrence of a number.

我想在第一次出现 number 时拆分

采纳答案by unutbu

In [32]: import re

In [33]: s='abcd2343 abw34324 abc3243-23A'

In [34]: re.split('(\d+)',s)
Out[34]: ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A']

Or, if you want to split on the first occurrence of a digit:

或者,如果您想在第一次出现数字时进行拆分:

In [43]: re.findall('\d*\D+',s)
Out[43]: ['abcd', '2343 abw', '34324 abc', '3243-', '23A']


  • \d+matches 1-or-more digits.
  • \d*\D+matches 0-or-more digits followed by 1-or-more non-digits.
  • \d+|\D+matches 1-or-more digits or1-or-more non-digits.
  • \d+匹配 1 个或多个数字。
  • \d*\D+匹配 0 个或多个数字后跟 1 个或多个非数字。
  • \d+|\D+匹配 1 个或多个数字1 个或多个非数字。

Consult the docsfor more about Python's regex syntax.

有关 Python 的正则表达式语法的更多信息,请参阅文档



re.split(pat, s)will split the string susing patas the delimiter. If patbegins and ends with parentheses (so as to be a "capturing group"), then re.splitwill return the substrings matched by patas well. For instance, compare:

re.split(pat, s)s使用pat作为分隔符分割字符串。如果pat以括号开头和结尾(以便成为“捕获组”),则re.split也将返回匹配的子字符串pat。例如,比较:

In [113]: re.split('\d+', s)
Out[113]: ['abcd', ' abw', ' abc', '-', 'A']   # <-- just the non-matching parts

In [114]: re.split('(\d+)', s)
Out[114]: ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A']  # <-- both the non-matching parts and the captured groups

In contrast, re.findall(pat, s)returns only the parts of sthat match pat:

相反,re.findall(pat, s)只返回s匹配的部分pat

In [115]: re.findall('\d+', s)
Out[115]: ['2343', '34324', '3243', '23']

Thus, if sends with a digit, you could avoid ending with an empty string by using re.findall('\d+|\D+', s)instead of re.split('(\d+)', s):

因此,如果s以数字结尾,则可以使用re.findall('\d+|\D+', s)代替来避免以空字符串结尾re.split('(\d+)', s)

In [118]: s='abcd2343 abw34324 abc3243-23A 123'

In [119]: re.split('(\d+)', s)
Out[119]: ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123', '']

In [120]: re.findall('\d+|\D+', s)
Out[120]: ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123']

回答by Mike

def firstIntIndex(string):
    result = -1
    for k in range(0, len(string)):
        if (bool(re.match('\d', string[k]))):
            result = k
            break
    return result

回答by jwsample

import re

m = re.match(r"(?P<letters>[a-zA-Z]+)(?P<the_rest>.+)$",input)

m.group('letters')
m.group('the_rest')

This covers your corner case of abc3243-23A and will output abcfor the letters group and 3243-23A for the_rest

这涵盖了 abc3243-23A 的角落情况,并将输出abc字母组和 3243-23Athe_rest

Since you said they are all on individual lines you'll obviously need to put a line at a time in input

既然你说他们都在单独的行上,你显然需要一次放一行 input

回答by Muhammad Alkarouri

To partition on the first digit

在第一个数字上分区

parts = re.split('(\d.*)','abcd2343')      # => ['abcd', '2343', '']
parts = re.split('(\d.*)','abc3243-23A')   # => ['abc', '3243-23A', '']

So the two parts are always parts[0] and parts[1].

所以这两个部分总是parts[0]和parts[1]。

Of course, you can apply this to multiple codes:

当然,您可以将其应用于多个代码:

>>> s = "abcd2343 abw34324 abc3243-23A"
>>> results = [re.split('(\d.*)', pcode) for pcode in s.split(' ')]
>>> results
[['abcd', '2343', ''], ['abw', '34324', ''], ['abc', '3243-23A', '']]

If each code is in an individual line then instead of s.split( )use s.splitlines().

如果每个代码都在单独的行中,则不要s.split( )使用s.splitlines().

回答by Basant Rules

Try this code it will work fine

试试这个代码它会正常工作

import re
text = "MARIA APARECIDA 99223-2000 / 98450-8026"
parts = re.split(r' (?=\d)',text, 1)
print(parts)

Output:

输出:

['MARIA APARECIDA', '99223-2000 / 98450-8026']

['玛丽亚阿帕雷西达','99223-2000 / 98450-8026']