用于遍历 XML 文件中的 ID 列表并将名称打印/输出到 shell 或输出文件的 BASH 脚本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21265504/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 09:20:00  来源:igfitidea点击:

BASH Script to iterate through a list of IDs in an XML file and print/output the name to shell or output file?

linuxbashshellawkgrep

提问by Mike J

I'm looking to iterate through a list of ID numbers which matches ID numbers in an XML file and print the line below using BASH (and AWK) to the shell or redirect it to a third, output file (output.txt)

我希望遍历与 XML 文件中的 ID 号匹配的 ID 号列表,并使用 BASH(和 AWK)将下面的行打印到 shell 或将其重定向到第三个输出文件 (output.txt)

Here is the breakdown:

这是细分:

ID_list.txt (shortened for this example - it has 100 IDs)

ID_list.txt(在这个例子中缩写 - 它有 100 个 ID)

4414
4561
2132
999
1231
34
489
3213
7941

XML_example.txt (thousands of entries)

XML_example.txt(数千个条目)

<book>
  <ID>4414</ID>
  <name>Name of first book</name>
</book>
<book>
  <ID>4561</ID>
  <name>Name of second book</name>
</book>

I'd like the output of the script to be the names of the 100 IDs from the first file:

我希望脚本的输出是第一个文件中 100 个 ID 的名称:

Name of first book
Name of second book
etc

I believe it's possible to do this using BASH and AWK with a for loop (for each in file 1, find the corresponding name in file2). I think you can recurisvely GREP for the ID number and then print the line below it using AWK. Even if the output looked like this, I can remove the XML tags after:

我相信可以使用 BASH 和 AWK 和 for 循环来做到这一点(对于文件 1 中的每个文件,在文件 2 中找到相应的名称)。我认为您可以为 ID 号递归地 GREP,然后使用 AWK 打印其下方的行。即使输出看起来像这样,我也可以在之后删除 XML 标记:

<name>Name of first book</name>
<name>Name of second book</name>

It's on a Linux server but I can port it over to PowerShell on Windows. I think BASH/GREP and AWK are the way to go.

它在 Linux 服务器上,但我可以将它移植到 Windows 上的 PowerShell。我认为 BASH/GREP 和 AWK 是要走的路。

Can someone help me script this?

有人可以帮我编写这个脚本吗?

采纳答案by dogbane

Here's one way:

这是一种方法:

while IFS= read -r id
do
    grep -A1 "<ID>$id</ID>" XML_example.txt | grep "<name>"
done < ID_list.txt

Here's another way (one-liner). This is more efficient because it uses a single grep to extract all the ids instead of looping:

这是另一种方式(单线)。这更有效,因为它使用单个 grep 来提取所有 id 而不是循环:

egrep -A1 $(sed -e 's/^/<ID>/g' -e 's/$/<\/ID>/g' ID_list.txt | sed -e :a -e '$!N;s/\n/|/;ta' ) XML_example.txt | grep "<name>"

Output:

输出:

<name>Name of first book</name>
<name>Name of second book</name>

回答by larsks

Given an ID, you can get the name using XPath xpressions and the xmllintcommand, like this:

给定一个 ID,您可以使用 XPath xpressions 和xmllint命令获取名称,如下所示:

id=4414
name=$(xmllint --xpath "string(//book[ID[text()='$id']]/name)" books.xml)

So with this, you could write something like:

所以有了这个,你可以写这样的东西:

while read id; do
    name=$(xmllint --xpath "string(//book[ID[text()='$id']]/name)" books.xml)
    echo "$name"
done < id_list.txt

Unlike solutions involving awk, grep, and friends, this is using an actual XML parsing tool. This means that while most other solutions might break if they encountered:

与涉及awkgrep和 朋友的解决方案不同,这是使用实际的 XML 解析工具。这意味着虽然大多数其他解决方案在遇到以下情况时可能会崩溃:

<book><ID>4561</ID><name>Name of second book</name></book>

...this would work just fine.

......这会工作得很好。

xmllintis part of the libxml2package, and is available on most distributions.

xmllintlibxml2软件包的一部分,可在大多数发行版中使用。

Note also that recent versions of awk have native XML parsing.

另请注意,最新版本的 awk 具有原生 XML 解析功能

回答by Ed Morton

$ awk '
NR==FNR{ ids["<ID>" 
 BASH_REMATCH
          An  array  variable  whose members are assigned by the =~ binary
          operator to the [[ conditional command.  The element with  index
          0  is  the  portion  of  the  string matching the entire regular
          expression.  The element with index n  is  the  portion  of  the
          string matching the nth parenthesized subexpression.  This vari‐
          able is read-only.
"</ID>"]; next } found { gsub(/^.*<name>|<[/]name>.*$/,""); print; found=0 } in ids { found=1 } ' ID_list.txt XML_example.txt Name of first book Name of second book

回答by Reinstate Monica Please

I would go the BASH_REMATCHroute if I had to do it in bash

BASH_REMATCH如果我必须用 bash 来做,我会走这条路

#!/bin/bash

while read -r line; do
  [[ $print ]] && [[ $line =~ "<name>"(.*)"</name>" ]] && echo "${BASH_REMATCH[1]}"

  if [[ $line == "<ID>"*"</ID>" ]]; then
    print=:
  else
    print=
  fi
done < "ID_list.txt"

So something like below

所以像下面这样

> abovescript
Name of first book
Name of second book

Example output

示例输出

##代码##