javascript 使用正则表达式从字符串中提取单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4755972/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-25 14:47:22  来源:igfitidea点击:

Extract word from string using regex

javascriptregex

提问by Jinbom Heo

In javascript, I want extract word list ends with 'y'.

在 javascript 中,我希望提取的单词列表以 'y' 结尾。

code is following,

代码如下,

var str = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";

str.match(/(\w+)y\W/g);

result is a array

结果是一个数组

["simply ", "dummy ", "industry.", "industry'", "dummy ", "galley ", "only ", "essentially ", "recently "]

so, my question is, Can I get a word list without 'y' character using regex. the result word list should be like this,

所以,我的问题是,我可以使用正则表达式获得没有“y”字符的单词列表吗?结果词表应该是这样的,

["simpl ", "dumm ", "industr.", "industr'", "dumm ", "galle ", "onl ", "essentiall", "recentl"]

/(\w+)y\W/gdoesn't work.

/(\w+)y\W/g不起作用。

回答by Walf

You need what's called a look-ahead assertion: the (?=x)means the characters in front of this match must match x, but don't capture them.

您需要所谓的前瞻断言:这(?=x)意味着此匹配项前面的字符必须匹配x,但不要捕获它们。

var trimmedWords = wordString.match(/\b\w+(?=y\b)/g);

回答by dheerosaur

Here is a way to do it:

这是一种方法:

var a = [], x;
while (x = /(\w+)y\W/g.exec(str)) {
    a.push(x[1]);
}

console.log(a);
//logs 
["simpl", "dumm", "industr", "industr", "dumm", "galle", "onl", "essentiall", "recentl"]

回答by Brad Christie

I think you're looking for \b(\w)*y\b. The \b is a word separator. The \w will match any word character, and the y to specify it's ending character. Then you grab the \w and exclude the y.

我想你正在寻找\b(\w)*y\b. \b 是一个单词分隔符。\w 将匹配任何单词字符,而 y 则指定它的结束字符。然后你抓住 \w 并排除 y。

*EDITI semi-take that back. If you're looking for "industr." (with the period included) this will not work. but I'll play around and see what I can come up with.

*编辑我半收回。如果您正在寻找“行业”。(包括期间)这将不起作用。但我会四处看看,看看我能想出什么。