bash 如何在shell脚本中提取字符串的前两个字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1405611/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 18:30:45  来源:igfitidea点击:

How to extract the first two characters of a string in shell scripting?

bashshellgrepshgnu-coreutils

提问by Greg

For example, given:

例如,给定:

USCAGoleta9311734.5021-120.1287855805

I want to extract just:

我只想提取:

US

回答by paxdiablo

Probably the most efficient method, if you're using the bashshell (and you appear to be, based on your comments), is to use the sub-string variant of parameter expansion:

如果您使用的是bashshell(并且根据您的评论,您似乎是),可能最有效的方法是使用参数扩展的子字符串变体:

pax> long="USCAGol.blah.blah.blah"
pax> short="${long:0:2}" ; echo "${short}"
US

This will set shortto be the first two characters of long. If longis shorter than two characters, shortwill be identical to it.

这将设置short为 的前两个字符long。如果long短于两个字符,short则与其相同。

This in-shell method is usually better if you're going to be doing it a lot (like 50,000 times per report as you mention) since there's no process creation overhead. All solutions which use external programs will suffer from that overhead.

如果您要进行很多操作(例如您提到的每个报告 50,000 次),这种壳内方法通常会更好,因为没有进程创建开销。所有使用外部程序的解决方案都会受到这种开销的影响。

If you also wanted to ensure a minimumlength, you could pad it out before hand with something like:

如果您还想确保最小长度,您可以事先用以下内容填充它:

pax> long="A"
pax> tmpstr="${long}.."
pax> short="${tmpstr:0:2}" ; echo "${short}"
A.

This would ensure that anything less than two characters in length was padded on the right with periods (or something else, just by changing the character used when creating tmpstr). It's not clear that you need this but I thought I'd put it in for completeness.

这将确保长度小于两个字符的任何东西都用句点填充(或其他东西,只需更改创建时使用的字符tmpstr)。目前尚不清楚您是否需要这个,但我想我会为了完整性而将其放入。



Having said that, there are any number of ways to do this with external programs (such as if you don't have bashavailable to you), some of which are:

话虽如此,有许多方法可以使用外部程序(例如,如果您没有bash可用的程序)来做到这一点,其中一些是:

short=$(echo "${long}" | cut -c1-2)
short=$(echo "${long}" | head -c2)
short=$(echo "${long}" | awk '{print substr (
${string:position:length}
, 0, 2)}' short=$(echo "${long}" | sed 's/^\(..\).*//')

The first two (cutand head) are identical for a single-line string - they basically both just give you back the first two characters. They differ in that cutwill give you the first two characters of each line and headwill give you the first two characters of the entire input

对于单行字符串,前两个 (cuthead) 是相同的 - 它们基本上都只返回前两个字符。它们的不同之处在于cut将为您提供每行的前两个字符,并head为您提供整个输入的前两个字符

The third one uses the awksub-string function to extract the first two characters and the fourth uses sedcapture groups (using ()and \1) to capture the first two characters and replace the entire line with them. They're both similar to cut- they deliver the first two characters of each line in the input.

第三个使用awksub-string 函数提取前两个字符,第四个使用sed捕获组(使用()\1)捕获前两个字符并用它们替换整行。它们都类似于cut- 它们提供输入中每行的前两个字符。

None of that matters if you are sure your input is a single line, they all have an identical effect.

如果您确定您的输入是一行,这些都不重要,它们都具有相同的效果。

回答by ennuikiller

easiest way is

最简单的方法是

echo "USCAGoleta9311734.5021-120.1287855805" | awk '{print substr(
echo "USCAGoleta9311734.5021-120.1287855805" | sed 's/\(^..\).*//'
,0,2)}'

Where this extracts $lengthsubstring from $stringat $position.

$length$stringat 中提取子字符串的位置$position

This is a bash builtin so awk or sed is not required.

这是一个内置的 bash,因此不需要 awk 或 sed。

回答by Paused until further notice.

You've gotten several good answers and I'd go with the Bash builtin myself, but since you asked about sedand awkand (almost) no one else offered solutions based on them, I offer you these:

你已经得到了一些很好的答案,我会用猛砸去内建自己,但既然你问sedawk和(几乎是基于他们)没有其他人提供的解决方案,我给你这些报价:

bash-3.2$ var=abcd
bash-3.2$ echo ${var:0:2}
ab

and

echo 'abcdef' | grep -Po "^.."        # ab

The awkone ought to be fairly obvious, but here's an explanation of the sedone:

这个awk应该是相当明显的,但这里有一个解释sed

  • substitute "s/"
  • the group "()" of two of any characters ".." starting at the beginning of the line "^" and followed by any character "." repeated zero or more times "*" (the backslashes are needed to escape some of the special characters)
  • by "/" the contents of the first (and only, in this case) group (here the backslash is a special escape referring to a matching sub-expression)
  • done "/"
  • 替换“s/”
  • 任何字符“..”中的两个字符组“()”从“^”行的开头开始,后跟任何字符“.” 重复零次或多次“*”(需要使用反斜杠来转义某些特殊字符)
  • 通过“/”表示第一个(在这种情况下也是唯一的)组的内容(这里的反斜杠是一个特殊的转义符,指的是匹配的子表达式)
  • 完毕 ”/”

回答by Dominic Mitchell

If you're in bash, you can say:

如果你在bash,你可以说:

$ original='USCAGoleta9311734.5021-120.1287855805'
$ printf '%-.2s' "$original"
US

This may be just what you need…

这可能正是您所需要的……

回答by Amir Mehler

Just grep:

只是grep:

cat file | colrm 3

回答by bschlueter

You can use printf:

您可以使用printf

sed 's/.//3g'

回答by Ian Yang

colrm— remove columns from a file

colrm— 从文件中删除列

To leave first two chars, just remove columns starting from 3

要保留前两个字符,只需删除从 3 开始的列

awk NF=1 FPAT=..

回答by Steven Penny

Quite late indeed but here it is

确实很晚,但它在这里

perl -pe '$_=unpack a2'

Or

或者

$ sh -c 'var=abcde; echo "${var%${var#??}}"'
ab

Or

或者

strEcho='echo ${str:0:2}' # '${str:2}' if you want to skip the first two characters and keep the rest
bash -c "str=\"$strFull\";$strEcho;"

回答by Juan

If you want to use shell scripting and not rely on non-posix extensions (such as so-called bashisms), you can use techniques that do not require forking external tools such as grep, sed, cut, awk, etc., which then make your script less efficient. Maybe efficiency and posix portability is not important in your use case. But in case it is (or just as a good habit), you can use the following parameter expansionoption method to extract the first two characters of a shell variable:

如果您想使用 shell 脚本而不依赖于非 posix 扩展(例如所谓的 bashisms),您可以使用不需要分叉外部工具的技术,例如 grep、sed、cut、awk 等,然后使您的脚本效率降低。也许效率和 posix 可移植性在您的用例中并不重要。但如果是(或只是作为一个好习惯),您可以使用以下参数扩展选项方法来提取 shell 变量的前两个字符:

##代码##

This uses "smallest prefix" parameter expansionto remove the first two characters (this is the ${var#??}part), then "smallest suffix" parameter expansion(the ${var%part) to remove that all-but-the-first-two-characters string from the original value.

这使用“最小前缀”参数扩展删除前两个字符(这是${var#??}部分),然后使用“最小后缀”参数扩展${var%部分)从原始字符串中删除所有但前两个字符的字符串价值。

This method was previously described in this answerto the "Shell = Check if variable begins with #" question. That answer also describes a couple similar parameter expansion methods that can be used in a slightly different context that the one that applies to the original question here.

此方法之前在“Shell = 检查变量是否以 # 开头”问题的答案中有所描述。该答案还描述了一些类似的参数扩展方法,这些方法可用于与此处适用于原始问题的上下文略有不同的上下文。

回答by palswim

If your system is using a different shell (not bash), but your system has bash, then you can still use the inherent string manipulation of bashby invoking bashwith a variable:

如果您的系统使用的是不同的 shell(不是bash),但您的系统有bash,那么您仍然可以bash通过调用bash变量来使用 的固有字符串操作:

##代码##