javascript 使用正则表达式将字符串拆分为单词数组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3548527/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-25 01:29:15  来源:igfitidea点击:

Splitting string into array of words using Regular Expressions

javascriptregex

提问by Mike Christensen

I'm trying to split a string into an array of words, however I want to keep the spaces after each word. Here's what I'm trying:

我试图将一个字符串拆分成一个单词数组,但是我想在每个单词之后保留空格。这是我正在尝试的:

var re = /[a-z]+[$\s+]/gi;
var test = "test   one two     three   four ";
var results = test.match(re);

The results I expect are:

我期望的结果是:

[0]: "test   "
[1]: "one "
[2]: "two     "
[3]: "three   "
[4]: "four "

However, it only matches up to one space after each word:

但是,它最多只能匹配每个单词后的一个空格:

[0]: "test "
[1]: "one "
[2]: "two "
[3]: "three "
[4]: "four "

What am I doing wrong?

我究竟做错了什么?

回答by Kobi

Consider:

考虑:

var results = test.match(/\S+\s*/g);

That would guarantee you don't miss any characters (besides a few spaces at the beginnings, but \S*\s*can take care of that)

这将保证您不会错过任何字符(除了开头的几个空格,但\S*\s*可以处理)

Your original regex reads:

您的原始正则表达式如下:

  • [a-z]+- match any number of letters (at least one)
  • [$\s+]- much a single character - $, +or whitespace. With no quantifier after this group, you only match a single space.
  • [a-z]+- 匹配任意数量的字母(至少一个)
  • [$\s+]- 很多单个字符 - $+或空格。在这个组之后没有量词,你只能匹配一个空格。

回答by Motti

Try the following:

请尝试以下操作:

test.match(/\w+\s+/g); // \w = words, \s = white spaces

回答by codaddict

You are using +inside the char class. Try using *outside the char class instead.

+在 char 类中使用。尝试*在 char 类之外使用。

/[a-z]+\s*/gi;

+inside the char class is treated as a literal +and not as a meta char. Using *will capture zero or more spaces that might follow any word.

+在 char 类中被视为文字+而不是元字符。Using*将捕获可能跟随任何单词的零个或多个空格。

回答by Felix Kling

The +is taken literally inside the character class. You have to move it outside: [\s]+or just \s+($has no meaning inside the class either).

+是字面上的字符类之中。你必须把它移到外面:[\s]+或者只是\s+$在课堂上也没有意义)。

回答by palswim

The essential bit of your RegEx that needs changing is the part matching the whitespace or end-of-line.

需要更改的 RegEx 的基本部分是与空格或行尾匹配的部分。

Try:

尝试:

var re = /[a-z]+($|\s+)/gi

or, for non-capturing groups(I don't know if you need this with the /gflag):

或者,对于非捕获组(我不知道您是否需要带有/g标志的):

var re = /[a-z]+(?:$|\s+)/gi