通过 bash 将 XML 提取到 json

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51404723/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 16:58:58  来源:igfitidea点击:

Extract XML to json via bash

jsonxmlbashsed

提问by Shawn Dibble

I am trying to find some specific information in an XML tag and convert it to a json string. I have come up with the most convoluted solution, but it almost works. I just need to remove the whitespace and line breaks. I have tried however that results in even my values to run together.

我正在尝试在 XML 标记中查找一些特定信息并将其转换为 json 字符串。我想出了最复杂的解决方案,但它几乎有效。我只需要删除空格和换行符。然而,我已经尝试过,结果甚至我的价值观也能一起运行。

Sample data:

样本数据:

<config>
  <derivedFrom>
    <courseName>Family and Medical Leave</courseName>
    <courseCode>FML</courseCode>
    <courseAuthor>Company 1</courseAuthor>
    <courseVersion>2.0.0</courseVersion>
    <importLocale>en-US</importLocale>
  </derivedFrom>
</config>

This is the sed code I am using:

这是我正在使用的 sed 代码:

sed -n '
    /<derivedFrom>/ {
    :a;
    N;
    /<\/derivedFrom>/!ba;
    s/.*<derivedFrom>//;
    s/<\/derivedFrom>//;
    s/<\/[a-zA-Z]*>/",/g;
    s/</"/g;
    s/>/":"/g;
    s/[[:space:]]//g;
    s/,$//g;
    p
    }'

And finally, here is my current output is "courseName":"FamilyandMedicalLeave","courseCode":"UBM2C","courseAuthor":"Alchemy","courseVersion":"2.0.021","importLocale":"en-US"

最后,这是我当前的输出 "courseName":"FamilyandMedicalLeave","courseCode":"UBM2C","courseAuthor":"Alchemy","courseVersion":"2.0.021","importLocale":"en-US"

I know I need to replace [[:space:]]with something else as I don't want text in my quotes to run together, but I am stuck. For example: Family and Medical Leave should keep its spaces. There is probably also an easier way to do this with some XML to JSON script. However, I need to do this without needing to install anything else onto the servers.

我知道我需要[[:space:]]用其他东西替换,因为我不希望引号中的文本一起运行,但我被卡住了。例如:家庭和病假应保留其空间。使用一些 XML 到 JSON 脚本可能还有一种更简单的方法来做到这一点。但是,我需要这样做而无需在服务器上安装任何其他东西。

回答by Sundeep

Note: I don't know all the details about xml and json. As you specify you cannot install a program, here's some steps using sedand pastethat might help you. This is intended as guide and may not be full answer you are expecting, and assumes data format as shown in sample

注意:我不知道有关 xml 和 json 的所有详细信息。正如你指定你不能安装一个程序,下面是使用一些步骤sedpaste,可以帮助你。这旨在作为指南,可能不是您期望的完整答案,并假设数据格式如示例中所示

Step 1: getting required lines (See How to select lines between two patterns?for details)

第 1 步:获取所需的线条(详见如何在两个图案之间选择线条?

$ sed -n '/<derivedFrom>/, /<\/derivedFrom>/{//!p}' ip.txt
    <courseName>Family and Medical Leave</courseName>
    <courseCode>FML</courseCode>
    <courseAuthor>Company 1</courseAuthor>
    <courseVersion>2.0.0</courseVersion>
    <importLocale>en-US</importLocale>

Step 2: Re-format the filtered lines
Can also combine with previous step as //!s|.*<\([^>]*\)>\(.*\)</\1>.*|"\1":"\2"|p

第 2 步:重新格式化过滤后的行
也可以与上一步结合为//!s|.*<\([^>]*\)>\(.*\)</\1>.*|"\1":"\2"|p

sed 's|.*<\([^>]*\)>\(.*\)</>.*|"":""|'
"courseName":"Family and Medical Leave"
"courseCode":"FML"
"courseAuthor":"Company 1"
"courseVersion":"2.0.0"
"importLocale":"en-US"

Step 3: join them using paste

第 3 步:使用粘贴将它们连接起来

paste -sd,
"courseName":"Family and Medical Leave","courseCode":"FML","courseAuthor":"Company 1","courseVersion":"2.0.0","importLocale":"en-US"

回答by Matias Barrios

Why not simply ? Valid for Bash

为什么不简单?对 Bash 有效

sed -n '
    /<derivedFrom>/ {
    :a;
    N;
    /<\/derivedFrom>/!ba;
    s/.*<derivedFrom>//;
    s/<\/derivedFrom>//;
    s/<\/[a-zA-Z]*>/",/g;
    s/</"/g;
    s/>/":"/g;
    s/,$//g;
    p
    }' input.txt | sed 's/^ *//g;s/ *$//g'

Regards!

问候!