javascript 使用正则表达式将字符串拆分为单词数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3548527/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Splitting string into array of words using Regular Expressions
提问by Mike Christensen
I'm trying to split a string into an array of words, however I want to keep the spaces after each word. Here's what I'm trying:
我试图将一个字符串拆分成一个单词数组,但是我想在每个单词之后保留空格。这是我正在尝试的:
var re = /[a-z]+[$\s+]/gi;
var test = "test one two three four ";
var results = test.match(re);
The results I expect are:
我期望的结果是:
[0]: "test "
[1]: "one "
[2]: "two "
[3]: "three "
[4]: "four "
However, it only matches up to one space after each word:
但是,它最多只能匹配每个单词后的一个空格:
[0]: "test "
[1]: "one "
[2]: "two "
[3]: "three "
[4]: "four "
What am I doing wrong?
我究竟做错了什么?
回答by Kobi
Consider:
考虑:
var results = test.match(/\S+\s*/g);
That would guarantee you don't miss any characters (besides a few spaces at the beginnings, but \S*\s*can take care of that)
这将保证您不会错过任何字符(除了开头的几个空格,但\S*\s*可以处理)
Your original regex reads:
您的原始正则表达式如下:
[a-z]+- match any number of letters (at least one)[$\s+]- much a single character -$,+or whitespace. With no quantifier after this group, you only match a single space.
[a-z]+- 匹配任意数量的字母(至少一个)[$\s+]- 很多单个字符 -$,+或空格。在这个组之后没有量词,你只能匹配一个空格。
回答by Motti
Try the following:
请尝试以下操作:
test.match(/\w+\s+/g); // \w = words, \s = white spaces
回答by codaddict
You are using +inside the char class. Try using *outside the char class instead.
您+在 char 类中使用。尝试*在 char 类之外使用。
/[a-z]+\s*/gi;
+inside the char class is treated as a literal +and not as a meta char.
Using *will capture zero or more spaces that might follow any word.
+在 char 类中被视为文字+而不是元字符。Using*将捕获可能跟随任何单词的零个或多个空格。
回答by Felix Kling
The +is taken literally inside the character class. You have to move it outside: [\s]+or just \s+($has no meaning inside the class either).
该+是字面上的字符类之中。你必须把它移到外面:[\s]+或者只是\s+($在课堂上也没有意义)。
回答by palswim
The essential bit of your RegEx that needs changing is the part matching the whitespace or end-of-line.
需要更改的 RegEx 的基本部分是与空格或行尾匹配的部分。
Try:
尝试:
var re = /[a-z]+($|\s+)/gi
or, for non-capturing groups(I don't know if you need this with the /gflag):
或者,对于非捕获组(我不知道您是否需要带有/g标志的):
var re = /[a-z]+(?:$|\s+)/gi

