javascript 你如何用空格和标点符号分割一个javascript字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6162600/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you split a javascript string by spaces and punctuation?
提问by chromedude
I have some random string, for example: Hello, my name is john.
. I want that string split into an array like this: Hello, ,, , my, name, is, john, .,
. I tried str.split(/[^\w\s]|_/g)
, but it does not seem to work. Any ideas?
我有一些随机字符串,例如:Hello, my name is john.
. 我想这个字符串分割成一个这样的数组:Hello, ,, , my, name, is, john, .,
。我试过了str.split(/[^\w\s]|_/g)
,但它似乎不起作用。有任何想法吗?
采纳答案by pepkin88
Try this (I'm not sure if this is what you wanted):
试试这个(我不确定这是否是你想要的):
str.replace(/[^\w\s]|_/g, function () { return ' ' + + ' ';}).replace(/[ ]+/g, ' ').split(' ');
回答by Rob Raisch
To split a str on any run of non-word characters I.e. Not A-Z, 0-9, and underscore.
在任何运行的非单词字符上拆分 str,即不是 AZ、0-9 和下划线。
var words=str.split(/\W+/); // assumes str does not begin nor end with whitespace
Or, assuming your target language is English, you can extract all semanticallyuseful values from a string (i.e. "tokenizing" a string) using:
或者,假设您的目标语言是英语,您可以使用以下方法从字符串中提取所有语义上有用的值(即“标记化”字符串):
var str='Here\'s a (good, bad, indifferent, ...) '+
'example sentence to be used in this test '+
'of English language "token-extraction".',
punct='\['+ '\!'+ '\"'+ '\#'+ '\$'+ // since javascript does not
'\%'+ '\&'+ '\\''+ '\('+ '\)'+ // support POSIX character
'\*'+ '\+'+ '\,'+ '\\'+ '\-'+ // classes, we'll need our
'\.'+ '\/'+ '\:'+ '\;'+ '\<'+ // own version of [:punct:]
'\='+ '\>'+ '\?'+ '\@'+ '\['+
'\]'+ '\^'+ '\_'+ '\`'+ '\{'+
'\|'+ '\}'+ '\~'+ '\]',
re=new RegExp( // tokenizer
'\s*'+ // discard possible leading whitespace
'('+ // start capture group
'\.{3}'+ // ellipsis (must appear before punct)
'|'+ // alternator
'\w+\-\w+'+ // hyphenated words (must appear before punct)
'|'+ // alternator
'\w+\'(?:\w+)?'+ // compound words (must appear before punct)
'|'+ // alternator
'\w+'+ // other words
'|'+ // alternator
'['+punct+']'+ // punct
')' // end capture group
);
// grep(ary[,filt]) - filters an array
// note: could use jQuery.grep() instead
// @param {Array} ary array of members to filter
// @param {Function} filt function to test truthiness of member,
// if omitted, "function(member){ if(member) return member; }" is assumed
// @returns {Array} all members of ary where result of filter is truthy
function grep(ary,filt) {
var result=[];
for(var i=0,len=ary.length;i++<len;) {
var member=ary[i]||'';
if(filt && (typeof filt === 'Function') ? filt(member) : member) {
result.push(member);
}
}
return result;
}
var tokens=grep( str.split(re) ); // note: filter function omitted
// since all we need to test
// for is truthiness
which produces:
它产生:
tokens=[
'Here\'s',
'a',
'(',
'good',
',',
'bad',
',',
'indifferent',
',',
'...',
')',
'example',
'sentence',
'to',
'be',
'used',
'in',
'this',
'test',
'of',
'English',
'language',
'"',
'token-extraction',
'"',
'.'
]
EDIT
编辑
Also available as a Github Gist
也可作为Github Gist 使用
回答by Reid
Try:
尝试:
str.split(/([_\W])/)
This will split by any non-alphanumeric character (\W
) and any underscore. It uses capturing parentheses to include the item that was split by in the final result.
这将被任何非字母数字字符 ( \W
) 和任何下划线分割。它使用捕获括号来包含在最终结果中被拆分的项目。
回答by MikeyB
This solution caused a challenge with spaces for me (still needed them), then I gave str.split(/\b/)
a shot and all is well. Spaces are output in the array, which won't be hard to ignore, and the ones left after punctuation can be trimmed out.
这个解决方案给我带来了空间的挑战(仍然需要它们),然后我试str.split(/\b/)
了一下,一切都很好。数组中输出空格,不难忽略,标点后剩下的可以修剪掉。