使用 awk 命令提取 xml 标签值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14054203/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extract xml tag value using awk command
提问by user1929905
I have a xml like below
我有一个像下面这样的 xml
<root>
<FIToFICstmrDrctDbt>
<GrpHdr>
<MsgId>A</MsgId>
<CreDtTm>2001-12-17T09:30:47</CreDtTm>
<NbOfTxs>0</NbOfTxs>
<TtlIntrBkSttlmAmt Ccy="EUR">0.0</TtlIntrBkSttlmAmt>
<IntrBkSttlmDt>1967-08-13</IntrBkSttlmDt>
<SttlmInf>
<SttlmMtd>CLRG</SttlmMtd>
<ClrSys>
<Prtry>xx</Prtry>
</ClrSys>
</SttlmInf>
<InstgAgt>
<FinInstnId>
<BIC>AAAAAAAAAAA</BIC>
</FinInstnId>
</InstgAgt>
</GrpHdr>
</FIToFICstmrDrctDbt>
</root>
I need to extract the value of each tag value in separate variables using awk command. how to do it?
我需要使用 awk 命令在单独的变量中提取每个标签值的值。怎么做?
回答by dogbane
You can use awkas shown below, however, this is NOT a robust solution and will fail if the xml is not formatted correctly e.g. if there are multiple elements on the same line.
您可以awk按如下所示使用,但是,这不是一个强大的解决方案,如果 xml 格式不正确,例如同一行上有多个元素,则会失败。
$ dt=$(awk -F '[<>]' '/IntrBkSttlmDt/{print }' file)
$ echo $dt
1967-08-13
I suggest you use a proper xml processing tool, like xmllint.
我建议您使用适当的 xml 处理工具,例如xmllint.
$ dt=$(xmllint --shell file <<< "cat //IntrBkSttlmDt/text()" | grep -v "^/ >")
$ echo $dt
1967-08-13
回答by Michael Hamilton
The following gawk command uses a record separator regex pattern to match the XML tags. Anything starting with a < followed by at least one non-> and terminated by a > is considered to be a tag. Gawk assigns each RS match into the RT variable. Anything between the tags will be parsed as the record text which gawk assigns to $0.
以下 gawk 命令使用记录分隔符正则表达式模式来匹配 XML 标签。任何以 < 后跟至少一个非 > 并以 > 结尾的东西都被认为是一个标签。Gawk 将每个 RS 匹配分配到 RT 变量中。标签之间的任何内容都将被解析为 gawk 分配给 $0 的记录文本。
gawk 'BEGIN { RS="<[^>]+>" } { print RT, > perl -lne 'if(/>[^<]*</){$_=~m/>([^<]*)</;push(@a,)}if(eof){foreach(@a){print $_}}' temp
A
2001-12-17T09:30:47
0
0.0
1967-08-13
CLRG
xx
AAAAAAAAAAA
}' myfile
回答by Vijay
below code stores all the tag values in an array!hope this helps. But i still belive this is not an optimal way to do it.
下面的代码将所有标签值存储在一个数组中!希望这会有所帮助。但我仍然相信这不是最好的方法。
##代码##
