string 如何将字符串拆分为 TCL 中的单词列表,忽略多个空格?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13380914/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 01:42:43  来源:igfitidea点击:

How to split a string into a list of words in TCL, ignoring multiple spaces?

stringsplittcl

提问by Jerry

Basically, I have a string that consists of multiple, space-separated words. The thing is, however, that there can be multiple spaces instead of just one separating the words. This is why [split]does not do what I want:

基本上,我有一个由多个空格分隔的单词组成的字符串。然而,问题是可以有多个空格,而不仅仅是一个分隔单词。这就是为什么[split]不做我想要的:

split "a    b"

gives me this:

给我这个:

{a {} {} {} b}

instead of this:

而不是这个:

{a b}

Searching Google, I found a page on the Tcler's wiki, where a user asked more or less the same question.

在 Google 上搜索,我在 Tcler 的 wiki 上找到了一个页面,其中用户或多或少地问了相同的问题。

One proposed solution would look like this:

一种提议的解决方案如下所示:

split [regsub -all {\s+} "a    b" " "]

which seems to work for simple string. But a test string such as [string repeat " " 4](used string repeat because StackOverflow strips multiple spaces) will result in regsubreturning " ", which splitwould again split up into {{} {}}instead of an empty list.

这似乎适用于简单的字符串。但是诸如[string repeat " " 4](使用字符串重复,因为 StackOverflow 删除多个空格)之类的测试字符串将导致regsub返回“”,这split将再次拆分为{{} {}}空列表而不是空列表。

Another proposed solution was this one, to force a reinterpretation of the given string as a list:

另一个提议的解决方案是这个,强制将给定字符串重新解释为列表:

lreplace "a   list   with many   spaces" 0 -1

But if there's one thing I've learned about TCL, it is that you should never use list functions (starting with l) on strings. And indeed, this one will choke on strings containing special characters (namely { and }):

但是,如果我从 TCL 学到了一件事,那就是永远不要l在字符串上使用列表函数(以 开头)。事实上,这个会被包含特殊字符(即 { 和 })的字符串阻塞:

lreplace "test    \{a b\}"

returns test {a b}instead of test \{a b\}(which would be what I want, every space-separated word split up into a single element of the resulting list).

返回test {a b}而不是test \{a b\}(这将是我想要的,每个空格分隔的单词都分成结果列表的一个元素)。

Yet another solution was to use a 'filter':

另一种解决方案是使用“过滤器”:

proc filter {cond list} {
    set res {}
    foreach element $list {if [$cond $element] {lappend res $element}}
    set res
}

You'd then use it like this:

然后你会像这样使用它:

filter llength [split "a   list   with many   spaces"]

Again, same problem. This would call llengthon a string, which might contain special characters (again, { and }) - passing it "\{a b\}" would result in TCL complaining about an "unmatched open brace in list".

同样的问题。这将调用llength一个字符串,该字符串可能包含特殊字符(同样,{ 和 }) - 传递它“\{ab\}”将导致 TCL 抱怨“列表中的开放大括号不匹配”。

I managed to get it to work by modifying the given filterfunction, adding a {*} in front of $cond in the if, so I could use it with string lengthinstead of llength, which seemed to work for every possible input I've tried to use it on so far.

我设法通过修改给定的filter函数使其工作,在if 中的 $cond 前面添加一个 {*},所以我可以使用它string length代替llength,这似乎适用于我尝试使用的每个可能的输入就到此为止。

Is this solution safe to use as it is now? Would it choke on some special input I didn't test so far? Or, is it possible to do this rightin a simpler way?

这个解决方案现在可以安全使用吗?它会不会因为我目前没有测试过的一些特殊输入而窒息?或者,是否有可能以更简单的方式正确地做到这一点?

回答by Donal Fellows

The easiest way is to use regexp -all -inlineto select and return all words. For example:

最简单的方法是使用regexp -all -inline选择并返回所有单词。例如:

# The RE matches any non-empty sequence of non-whitespace characters
set theWords [regexp -all -inline {\S+} $theString]

If instead you define words to be sequences of alphanumerics, you instead use this for the regular expression term: {\w+}

相反,如果您将单词定义为字母数字序列,则将其用于正则表达式术语: {\w+}