创建 bash 脚本以将 xml 文件解析为 csv

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21507796/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 09:25:58  来源:igfitidea点击:

Creating bash script to parse xml file to csv

xmllinuxbashcsv

提问by user3259914

I'm trying to create a bash script to parse an xml file and save it to a csv file.

我正在尝试创建一个 bash 脚本来解析一个 xml 文件并将其保存到一个 csv 文件中。

For example:

例如:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <List>
    <Job id="1" name="John/>
    <Job id="2" name="Zack"/>
    <Job id="3" name="Bob"/>
</List>

I would like the script to save information into a csv file as such:

我希望脚本将信息保存到 csv 文件中,如下所示:

John | 1
Zack | 2
Bob  | 3

The name and id will be in a different cell.

名称和 ID 将位于不同的单元格中。

Is there any way I can do this?

有什么办法可以做到这一点吗?

回答by devnull

You've posted a query similar to your pervious one. I'd again suggest using a XML parser. You could say:

你已经张贴类似的查询透水一个。我再次建议使用 XML 解析器。你可以说:

xmlstarlet sel -t -m //List/Job -v @name -o "|" -v @id -n file.xml

It would return

它会回来

John|1
Zack|2
Bob|3

for your sample data.

为您的样本数据。

Pipe the output to sed: sed "s/|/\t| /"if you want it to appearas in your example.

将输出通过管道传送到sed:sed "s/|/\t| /"如果您希望它显示为您的示例。

回答by Reinstate Monica Please

Try something like this

尝试这样的事情

#!/bin/bash
while read -r line; do
  [[ $line =~ "name=\""(.*)"\"" ]] && name="${BASH_REMATCH[1]}" && [[ $line =~ "Job id=\""([^\"]+) ]] &&  echo "$name | ${BASH_REMATCH[1]}"
done < file 

The line with Johnis malformed. With it fixed, example output

与的行John格式不正确。固定后,示例输出

John | 1
Zack | 2
Bob | 3

回答by BMW

Using sed

使用 sed

sed -nr 's/.*id=\"([0-9]*)\"[^\"]*\"(\w*).*/ | /p' file

Additional, base on BroSlow's cript, I merge the options.

另外,基于 BroSlow 的脚本,我合并了选项。

#!/bin/bash

while read -r line; do
  [[ $line =~ id=\"([0-9]+).*name=\"([^\"|/]*) ]] && echo "${BASH_REMATCH[2]} | ${BASH_REMATCH[1]}"
done < file

回答by Vanuan

Extending xmlstarlet approach:

扩展 xmlstarlet 方法:

Given this xml file as input:

将此 xml 文件作为输入:

<DATA>
  <RECORD>
    <NAME>John</NAME>
    <SURNAME>Smith</SURNAME>
    <CONTACTS>
      "Smith" LTD,
      London, Mtg Str, 12,
      UK
    </CONTACTS>
  </RECORD>
</DATA>

And this script:

这个脚本:

xmlstarlet sel -e utf-8 -t \
  -o "NAME, SURNAME, CONTACTS" -n \
  -m //DATA/RECORD \
  -o "\"" \
  -v $"str:replace(normalize-space(NAME), '\"', '\"\"')" -o "\",\"" \
  -v $"str:replace(normalize-space(SURNAME),      '\"', '\"\"')" -o "\",\"" \
  -v $"str:replace(normalize-space(CONTACTS), '\"', '\"\"')" -o "\",\"" \
  -o "\"" \
  -n file.xml

You'll have the following output:

您将获得以下输出:

NAME, SURNAME, CONTACTS
"John", "Smith", """Smith"" LTD, London, Mtg Str, 12, UK"