scala 如何通过正则表达式拆分此字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5286885/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 02:54:06  来源:igfitidea点击:

How to split this string by regex?

regexscalasplit

提问by Freewind

I have some string, they looks like:

我有一些字符串,它们看起来像:

div#title.title.top
#main.main
a.bold#empty.red

They are similar to haml, and I want to split them by regex, but I don't know how to define it.

它们类似于haml,我想通过正则表达式拆分它们,但我不知道如何定义它。

val r = """???""".r // HELP
val items = "a.bold#empty.red".split(r)
items // -> "a", ".bold", "#empty", ".red"

How to do this?

这个怎么做?



UPDATE

更新

Sorry, everyone, but I need to make this question harder. I'm very interested in

对不起,大家,但我需要让这个问题更难。我很感兴趣

val r = """(?<=\w)\b"""

But it failed to parse the more complex ones:

但它未能解析更复杂的:

div#question-title.title-1.h-222_333

I hope it will be parsed to:

我希望它会被解析为:

div
#question-title
.title-1
.h-222_333 

I wanna know how to improve that regex?

我想知道如何改进该正则表达式?

采纳答案by Josh M.

I'm not completely sure what you need here but this should help:

我不完全确定您在这里需要什么,但这应该会有所帮助:

(?:\.|#)?\w+

It means a "term" is defined as an optional dot or hash followed by some word characters.

这意味着“术语”被定义为可选的点或散列,后跟一些单词字符。

You will end up with:

你最终会得到:

div
#title
.title
.top
#main
.main
a
.bold
#empty
.red

回答by Daniel C. Sobral

val r = """(?<=\w)\b(?!-)"""

Note that split takes a Stringrepresenting a regular expression, not a Regex, so you must not convert rfrom Stringto Regex.

请注意, split 采用 aString表示正则表达式,而不是 a Regex,因此您不能r从转换StringRegex

Brief explanation on the regex:

关于正则表达式的简要说明:

  • (?<=...)is a look-behind. It states that this match must be preceded by the pattern ..., or, in your case \w, meaning you want the pattern to follow a digit, letter, or underline.

  • \bmeans word boundary. It is a zero-length match that happen between a word character (digits, letters and underscore) and a non-word character, or vice versa. Because it is zero-length, splitwon't remove any character when splitting.

  • (?!...)is a negative-lookahead. Here I use to say that I'm not interested in word boundaries from a letter to a dash.

  • (?<=...)是后视。它指出此匹配必须以模式开头,或者...在您的情况下\w,这意味着您希望模式跟在数字、字母或下划线之后。

  • \b表示词边界。它是在单词字符(数字、字母和下划线)和非单词字符之间发生的零长度匹配,反之亦然。因为它是零长度,split所以拆分时不会删除任何字符。

  • (?!...)是负前瞻。在这里我经常说我对从字母到破折号的单词边界不感兴趣。

回答by Ken Bloom

Starting with Josh M's answer, he has a good regular expression, but since splittakes a regular expression matching the "delimiter", you need to use findAllInas follows:

从 Josh M 的回答开始,他有一个很好的正则表达式,但由于split采用了匹配“分隔符”的正则表达式,因此您需要使用findAllIn如下:

val r = """(?:\.|#)?\w+""".r
val items = r findAllIn "a.bold#empty.red"
    //maybe you want a toList on the end also

Then you get the results

然后你得到结果

div#title.title.top    -> List(div, #title, .title, .top)
#main.main             -> List(#main, .main)
a.bold#empty.red       -> List(a, .bold, #empty, .red)