string 在 Bash 中提取子字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/428109/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extract substring in Bash
提问by Berek Bryan
Given a filename in the form someletters_12345_moreleters.ext
, I want to extract the 5 digits and put them into a variable.
给定表单中的文件名someletters_12345_moreleters.ext
,我想提取 5 位数字并将它们放入一个变量中。
So to emphasize the point, I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.
所以为了强调这一点,我有一个包含 x 个字符的文件名,然后是一个五位数字序列,两边各有一个下划线,然后是另一组 x 个字符。我想取 5 位数字并将其放入变量中。
I am very interested in the number of different ways that this can be accomplished.
我对实现这一目标的不同方式的数量非常感兴趣。
采纳答案by FerranB
回答by JB.
If xis constant, the following parameter expansion performs substring extraction:
如果x是常数,则以下参数扩展执行子字符串提取:
b=${a:12:5}
where 12is the offset (zero-based) and 5is the length
其中12是偏移量(从零开始),5是长度
If the underscores around the digits are the only ones in the input, you can strip off the prefix and suffix (respectively) in two steps:
如果数字周围的下划线是输入中唯一的下划线,您可以分两步去除前缀和后缀(分别):
tmp=${a#*_} # remove prefix ending in "_"
b=${tmp%_*} # remove suffix starting with "_"
If there are other underscores, it's probably feasible anyway, albeit more tricky. If anyone knows how to perform both expansions in a single expression, I'd like to know too.
如果还有其他下划线,无论如何它可能都是可行的,尽管更棘手。如果有人知道如何在一个表达式中执行两个扩展,我也想知道。
Both solutions presented are pure bash, with no process spawning involved, hence very fast.
提供的两种解决方案都是纯 bash,不涉及进程生成,因此速度非常快。
回答by Johannes Schaub - litb
Generic solution where the number can be anywhere in the filename, using the first of such sequences:
数字可以在文件名中的任何位置的通用解决方案,使用第一个这样的序列:
number=$(echo $filename | egrep -o '[[:digit:]]{5}' | head -n1)
Another solution to extract exactly a part of a variable:
精确提取变量一部分的另一种解决方案:
number=${filename:offset:length}
If your filename always have the format stuff_digits_...
you can use awk:
如果您的文件名始终具有stuff_digits_...
您可以使用 awk的格式:
number=$(echo $filename | awk -F _ '{ print }')
Yet another solution to remove everything except digits, use
删除除数字以外的所有内容的另一种解决方案,使用
number=$(echo $filename | tr -cd '[[:digit:]]')
回答by brown.2179
just try to use cut -c startIndx-stopIndx
只是尝试使用 cut -c startIndx-stopIndx
回答by jperelli
In case someone wants more rigorous information, you can also search it in man bash like this
如果有人想要更严格的信息,你也可以像这样在 man bash 中搜索
$ man bash [press return key]
/substring [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]
Result:
结果:
${parameter:offset} ${parameter:offset:length} Substring Expansion. Expands to up to length characters of parameter starting at the character specified by offset. If length is omitted, expands to the substring of parameter start‐ ing at the character specified by offset. length and offset are arithmetic expressions (see ARITHMETIC EVALUATION below). If offset evaluates to a number less than zero, the value is used as an offset from the end of the value of parameter. Arithmetic expressions starting with a - must be separated by whitespace from the preceding : to be distinguished from the Use Default Values expansion. If length evaluates to a number less than zero, and parameter is not @ and not an indexed or associative array, it is interpreted as an offset from the end of the value of parameter rather than a number of characters, and the expan‐ sion is the characters between the two offsets. If parameter is @, the result is length positional parameters beginning at off‐ set. If parameter is an indexed array name subscripted by @ or *, the result is the length members of the array beginning with ${parameter[offset]}. A negative offset is taken relative to one greater than the maximum index of the specified array. Sub‐ string expansion applied to an associative array produces unde‐ fined results. Note that a negative offset must be separated from the colon by at least one space to avoid being confused with the :- expansion. Substring indexing is zero-based unless the positional parameters are used, in which case the indexing starts at 1 by default. If offset is 0, and the positional parameters are used,is prefixed to the list.FN=someletters_12345_moreleters.ext [[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}
回答by nicerobot
Here's how i'd do it:
这是我的方法:
a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo
# prints 12345
Explanation:
解释:
Bash-specific:
特定于 Bash:
[[ ]]
indicates a conditional expression=~
indicates the condition is a regular expression&&
chainsthe commandsif the prior command was successful
[[ ]]
表示条件表达式=~
表示条件是正则表达式&&
如果先前的命令成功,则链接命令
Regular Expressions (RE): _([[:digit:]]{5})_
正则表达式 (RE): _([[:digit:]]{5})_
_
are literals to demarcate/anchor matching boundaries for the string being matched()
create a capture group[[:digit:]]
is a character class, i think it speaks for itself{5}
means exactly five of the prior character, class (as in this example), or group must match
_
是为被匹配的字符串划定/锚定匹配边界的文字()
创建捕获组[[:digit:]]
是一个字符类,我认为它不言自明{5}
表示前一个字符、类(如本例中)或组中的五个必须匹配
In english, you can think of it behaving like this: the FN
string is iterated character by character until we see an _
at which point the capture group is openedand we attempt to match five digits. If that matching is successful to this point, the capture group saves the five digits traversed. If the next character is an _
, the condition is successful, the capture group is made available in BASH_REMATCH
, and the next NUM=
statement can execute. If any part of the matching fails, saved details are disposed of and character by character processing continues after the _
. e.g. if FN
where _1 _12 _123 _1234 _12345_
, there would be four false starts before it found a match.
在英语中,你可以认为它的行为是这样的:FN
字符串逐个字符地迭代,直到我们看到_
捕获组被打开并尝试匹配五个数字。如果此时匹配成功,则捕获组将保存遍历的五位数字。如果下一个字符是_
,则条件成功,捕获组在 中可用BASH_REMATCH
,并且NUM=
可以执行下一条语句。如果匹配的任何部分失败,保存的详细信息将被处理,并在_
. 例如,如果FN
where _1 _12 _123 _1234 _12345_
,在找到匹配之前会有四个错误的开始。
回答by user1338062
I'm surprised this pure bash solution didn't come up:
我很惊讶这个纯 bash 解决方案没有出现:
substring=$(expr "$filename" : '.*_\([^_]*\)_.*')
You probably want to reset IFS to what value it was before, or unset IFS
afterwards!
您可能希望将 IFS 重置为之前或unset IFS
之后的值!
回答by PEZ
Building on jor's answer (which doesn't work for me):
基于 jor 的答案(这对我不起作用):
$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]+"
12345
回答by fedorqui 'SO stop harming'
Following the requirements
遵循要求
I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.
我有一个包含 x 个字符的文件名,然后是一个五位数字序列,两边各有一个下划线,然后是另一组 x 个字符。我想取 5 位数字并将其放入变量中。
I found some grep
ways that may be useful:
我发现了一些grep
可能有用的方法:
$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]{5}"
12345
or better
或更好
$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d+'
12345
And then with -Po
syntax:
然后使用-Po
语法:
$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d{5}'
12345
Or if you want to make it fit exactly 5 characters:
或者,如果您想让它恰好适合 5 个字符:
name='someletters_12345_moreleters.ext'
echo $name | sed 's/[^0-9]*//g' # 12345
echo $name | tr -c -d 0-9 # 12345
Finally, to make it be stored in a variable it is just need to use the var=$(command)
syntax.
最后,要将其存储在变量中,只需要使用var=$(command)
语法即可。
回答by fedorqui 'SO stop harming'
If we focus in the concept of:
"A run of (one or several) digits"
如果我们专注于
“一系列(一个或几个)数字”的概念
We could use several external tools to extract the numbers.
We could quite easily erase all other characters, either sed or tr:
我们可以使用几种外部工具来提取数字。
我们可以很容易地删除所有其他字符,无论是 sed 还是 tr:
echo $name | sed 's/[^0-9]*//g' # 12345323
echo $name | tr -c -d 0-9 # 12345323
But if $name contains several runs of numbers, the above will fail:
但是如果 $name 包含多个数字运行,上面的将失败:
If "name=someletters_12345_moreleters_323_end.ext", then:
如果“name=someletters_12345_moreleters_323_end.ext”,则:
echo $name | sed 's/[^0-9]*\([0-9]\{1,\}\).*$//'
perl -e 'my $name='$name';my ($num)=$name=~/(\d+)/;print "$num\n";'
We need to use regular expresions (regex).
To select only the first run (12345 not 323) in sed and perl:
我们需要使用正则表达式(regex)。
要在 sed 和 perl 中仅选择第一次运行(12345 而不是 323):
regex=[^0-9]*([0-9]{1,}).*$; \
[[ $name =~ $regex ]] && echo ${BASH_REMATCH[1]}
But we could as well do it directly in bash(1):
但是我们也可以直接在 bash (1) 中进行:
##代码##This allows us to extract the FIRST run of digits of any length
surrounded by any other text/characters.
这允许我们提取
由任何其他文本/字符包围的任何长度的第一轮数字。
Note: regex=[^0-9]*([0-9]{5,5}).*$;
will match only exactly 5 digit runs. :-)
注意:regex=[^0-9]*([0-9]{5,5}).*$;
将仅匹配 5 位数字运行。:-)
(1): faster than calling an external tool for each short texts. Not faster than doing all processing inside sed or awk for large files.
(1):比为每个短文本调用外部工具更快。不比在 sed 或 awk 中对大文件进行所有处理快。