string 如何将字符串拆分为 TCL 中的单词列表,忽略多个空格?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13380914/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split a string into a list of words in TCL, ignoring multiple spaces?
提问by Jerry
Basically, I have a string that consists of multiple, space-separated words. The thing is, however, that there can be multiple spaces instead of just one separating the words. This is why [split]
does not do what I want:
基本上,我有一个由多个空格分隔的单词组成的字符串。然而,问题是可以有多个空格,而不仅仅是一个分隔单词。这就是为什么[split]
不做我想要的:
split "a b"
gives me this:
给我这个:
{a {} {} {} b}
instead of this:
而不是这个:
{a b}
Searching Google, I found a page on the Tcler's wiki, where a user asked more or less the same question.
在 Google 上搜索,我在 Tcler 的 wiki 上找到了一个页面,其中用户或多或少地问了相同的问题。
One proposed solution would look like this:
一种提议的解决方案如下所示:
split [regsub -all {\s+} "a b" " "]
which seems to work for simple string. But a test string such as [string repeat " " 4]
(used string repeat because StackOverflow strips multiple spaces) will result in regsub
returning " ", which split
would again split up into {{} {}}
instead of an empty list.
这似乎适用于简单的字符串。但是诸如[string repeat " " 4]
(使用字符串重复,因为 StackOverflow 删除多个空格)之类的测试字符串将导致regsub
返回“”,这split
将再次拆分为{{} {}}
空列表而不是空列表。
Another proposed solution was this one, to force a reinterpretation of the given string as a list:
另一个提议的解决方案是这个,强制将给定字符串重新解释为列表:
lreplace "a list with many spaces" 0 -1
But if there's one thing I've learned about TCL, it is that you should never use list functions (starting with l
) on strings. And indeed, this one will choke on strings containing special characters (namely { and }):
但是,如果我从 TCL 学到了一件事,那就是永远不要l
在字符串上使用列表函数(以 开头)。事实上,这个会被包含特殊字符(即 { 和 })的字符串阻塞:
lreplace "test \{a b\}"
returns test {a b}
instead of test \{a b\}
(which would be what I want, every space-separated word split up into a single element of the resulting list).
返回test {a b}
而不是test \{a b\}
(这将是我想要的,每个空格分隔的单词都分成结果列表的一个元素)。
Yet another solution was to use a 'filter':
另一种解决方案是使用“过滤器”:
proc filter {cond list} {
set res {}
foreach element $list {if [$cond $element] {lappend res $element}}
set res
}
You'd then use it like this:
然后你会像这样使用它:
filter llength [split "a list with many spaces"]
Again, same problem. This would call llength
on a string, which might contain special characters (again, { and }) - passing it "\{a b\}" would result in TCL complaining about an "unmatched open brace in list".
同样的问题。这将调用llength
一个字符串,该字符串可能包含特殊字符(同样,{ 和 }) - 传递它“\{ab\}”将导致 TCL 抱怨“列表中的开放大括号不匹配”。
I managed to get it to work by modifying the given filter
function, adding a {*} in front of $cond in the if, so I could use it with string length
instead of llength
, which seemed to work for every possible input I've tried to use it on so far.
我设法通过修改给定的filter
函数使其工作,在if 中的 $cond 前面添加一个 {*},所以我可以使用它string length
代替llength
,这似乎适用于我尝试使用的每个可能的输入就到此为止。
Is this solution safe to use as it is now? Would it choke on some special input I didn't test so far? Or, is it possible to do this rightin a simpler way?
这个解决方案现在可以安全使用吗?它会不会因为我目前没有测试过的一些特殊输入而窒息?或者,是否有可能以更简单的方式正确地做到这一点?
回答by Donal Fellows
The easiest way is to use regexp -all -inline
to select and return all words. For example:
最简单的方法是使用regexp -all -inline
选择并返回所有单词。例如:
# The RE matches any non-empty sequence of non-whitespace characters
set theWords [regexp -all -inline {\S+} $theString]
If instead you define words to be sequences of alphanumerics, you instead use this for the regular expression term: {\w+}
相反,如果您将单词定义为字母数字序列,则将其用于正则表达式术语: {\w+}