bash 如何解析多行记录(使用 awk?)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29357919/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to parse multi line records (with awk?)
提问by Six
I'm trying to figure out how to extract particular fields from multi line records separated by \n\n
.
我试图弄清楚如何从由\n\n
.
In this instance, it happens to be output from apt-cache akin to DEBIAN control files. See output of apt-cache show "$package"
在这种情况下,它恰好是从类似于 DEBIAN 控制文件的 apt-cache 输出。见输出apt-cache show "$package"
Package: caffeine
Priority: optional
Section: misc
Installed-Size: 641
Maintainer: Reuben Thomas <[email protected]>
Architecture: all
Version: 2.8.3
Depends: python3:any (>= 3.3.2-2~), python3, gir1.2-gtk-3.0, gir1.2-appindicator3-0.1, python3-xlib, python3-pkg-resources, libnet-dbus-perl
Filename: pool/main/c/caffeine/caffeine_2.8.3_all.deb
Size: 58774
MD5sum: 4438db3f6d1cf43a4f4b49cc7f24cda0
SHA1: e748370ac5ccd7de6fc9466ce0451d2e90d179d4
SHA256: ae303b4e32949cc1e1af80df7217e3406291679e3f18fa8f78a5bbb97504c4f6
Description-en: Prevent the desktop becoming idle in full-screen mode
Caffeine stops the desktop becoming idle when an application
is running full-screen. A desktop indicator ‘caffeine-indicator'
supplies a manual toggle, and the command ‘caffeinate' can be used
to prevent idleness for the duration of any command.
Description-md5: 7c14f8adc007b10f6ecafed36260bedb
Package: caffeine
Priority: optional
Section: misc
Installed-Size: 655
Maintainer: Reuben Thomas <[email protected]>
Architecture: all
Version: 2.6+555~ubuntu14.04.1
Depends: python:any (<< 2.8), python:any (>= 2.7.5-5~), python, gir1.2-gtk-2.0, gir1.2-appindicator3-0.1, x11-utils, python-dbus
Filename: pool/main/c/caffeine/caffeine_2.6+555~ubuntu14.04.1_all.deb
Size: 58604
MD5sum: 1051c3f7d40d344f986bb632d7436849
SHA1: 5e5f622595e8cbba8fb7468b3cffe2914b0ba110
SHA256: 11c5bbf2d28dcda6a7b82872195f740f1f79521b60d3c9acea3037bf0ab3a60e
Description: Prevent the desktop becoming idle
Caffeine allows the user to prevent the desktop becoming idle,
either manually or when certain applications are run. This
prevents screen-blanking, locking, suspending, and so on.
Description-md5: 738866350e5086e77408d7a9c7ffa59b
Package: caffeine
Status: install ok installed
Priority: optional
Section: misc
Installed-Size: 794
Maintainer: Isaiah Heyer <[email protected]>
Architecture: all
Version: 2.4.1+478~raring1
Depends: dconf-gsettings-backend | gsettings-backend, python (>= 2.6), python-central (>= 0.6.11), python-xlib, python-appindicator, python-xdg, python-notify, python-kaa-metadata
Description: Caffeine
A status bar application able to temporarily prevent the activation
of both the screensaver and the "sleep" powersaving mode.
Description-md5: 1c29acf1ab0f2e6636db29fbde1d14a3
Homepage: https://launchpad.net/caffeine
Python-Version: >= 2.6
My desired output is one line per record in the format apt-get download $pkg=$ver -a=$arch
. Basically a list of the installation commands for available packages...
我想要的输出是格式为每条记录一行apt-get download $pkg=$ver -a=$arch
。基本上是可用软件包的安装命令列表...
So far what I've got is apt-cache show "$package" | awk '/^Package: / { print $2 } /^Version: / { print $2 } /^Architecture: / { print $2 }' | xargs -n3 | awk '{printf "apt-get download %s=%s -a=%s\n", $1, $3, $2}'
到目前为止,我所拥有的是 apt-cache show "$package" | awk '/^Package: / { print $2 } /^Version: / { print $2 } /^Architecture: / { print $2 }' | xargs -n3 | awk '{printf "apt-get download %s=%s -a=%s\n", $1, $3, $2}'
This is the actual output:
这是实际输出:
apt-get download caffeine=2.8.3 -a=all
apt-get download caffeine=2.6+555~ubuntu14.04.1 -a=all
apt-get download caffeine=2.4.1+478~raring1 -a=all
The is as desired but it appears to be a fluke only because the order of the fields is consistent in this example. It would break if the order of fields was different.
符合要求,但它似乎只是侥幸,因为在此示例中字段的顺序是一致的。如果字段的顺序不同,它会中断。
I can do parsing like this using object orientation in Python but I'm having difficulty getting this done in one awk command. The only way I can see doing this correctly would be to split each record into individual tmp files (using split or something along those lines) and then parse each file individually (which is straightforward). Obviously I'd really like to avoid unnecessary I/O as this seems like something that awk is well equipped for. Any awk pro's know how to solve this? I'd even be open to a Perl one-liner or utilizing bash but I'm really interested in learning how to better leverage awk.
我可以在 Python 中使用面向对象进行这样的解析,但我很难在一个 awk 命令中完成这项工作。我能看到正确执行此操作的唯一方法是将每个记录拆分为单独的 tmp 文件(使用 split 或类似方法),然后单独解析每个文件(这很简单)。显然,我真的很想避免不必要的 I/O,因为这似乎是 awk 所擅长的。任何 awk 专业人士都知道如何解决这个问题?我什至对 Perl 单行或使用 bash 持开放态度,但我真的很想学习如何更好地利用 awk。
采纳答案by John1024
$ package=sed
$ apt-cache show "$package" | awk '/^Package: /{p=} /^Version: /{v=} /^Architecture: /{a=} /^$/{print "apt-get download "p"="v" -a="a}'
apt-get download sed=4.2.1-10 -a=amd64
How it works
这个怎么运作
/^Package: /{p=$2}
Save the package information in variable
p
./^Version: /{v=$2}
Save the version information in variable
v
./^Architecture: /{a=$2}
Save the architecture information in variable
a
./^$/{print "apt-get download "p"="v" -a="a}
When we reach a blank line, print out the information in the desired form.
My version of
apt-cache
always outputs a blank line after each package. Your sample output is missing the last blank line. If yourapt-cache
genuinely does not produce that last blank line, then we will need to add a little bit more code to compensate.As a matter of style, some may prefer
printf
toprint
. In which case, replace the above with:/^$/{printf "apt-get download %s=%s -a=%s\n",v,p,a}'
/^Package: /{p=$2}
将包裹信息保存在 variable 中
p
。/^Version: /{v=$2}
将版本信息保存在变量中
v
。/^Architecture: /{a=$2}
将架构信息保存在变量中
a
。/^$/{print "apt-get download "p"="v" -a="a}
当我们到达一个空行时,以所需的形式打印出信息。
我的版本
apt-cache
总是在每个包后输出一个空行。您的示例输出缺少最后一个空行。如果你apt-cache
真的没有产生最后一个空行,那么我们将需要添加更多的代码来补偿。作为一个风格问题,有些人可能更喜欢
printf
到print
。在这种情况下,将上述内容替换为:/^$/{printf "apt-get download %s=%s -a=%s\n",v,p,a}'
回答by Ed Morton
I find the best way to deal with data that contains name to value pairings is to create an array of those pairs and then just access the values by their names:
我发现处理包含名称到值配对的数据的最佳方法是创建这些对的数组,然后仅通过它们的名称访问值:
$ cat tst.awk
BEGIN { RS=""; FS="\n" }
{
delete n2v
for (i=1;i<=NF;i++) {
if ($i !~ /^ /) {
name = gensub(/:.*/,"","",$i)
value = gensub(/[^:]+:\s+/,"","",$i)
n2v[name] = value
}
}
printf "apt-get download %s=%s -a=%s\n",
n2v["Package"], n2v["Version"], n2v["Architecture"]
}
$ awk -f tst.awk file
apt-get download caffeine=2.8.3 -a=all
apt-get download caffeine=2.6+555~ubuntu14.04.1 -a=all
apt-get download caffeine=2.4.1+478~raring1 -a=all
The above uses a couple of gawk extensions but is easily adapted to any awk if necessary.
上面使用了几个 gawk 扩展,但如果需要,可以很容易地适应任何 awk。