如何从 Unix 命令行中删除 XML 标签?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5376024/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove XML tags from Unix command line?
提问by Tarski
I am grepping an XML File, which gives me output like this:
我正在搜索一个 XML 文件,它给了我这样的输出:
<tag>data</tag>
<tag>more data</tag>
...
Note, this is a flat file, not an XML tree. I want to remove the XML tags and just display the data in between. I'm doing all this from the command line and was wondering if there is a better way than piping it into awk twice...
请注意,这是一个平面文件,而不是 XML 树。我想删除 XML 标签并只显示其间的数据。我正在从命令行执行所有这些操作,并且想知道是否有比将其两次输入 awk 更好的方法......
cat file.xml | awk -F'>' '{print }' | awk -F'<' '{print }'
Ideally, I would like to do this in one command
理想情况下,我想在一个命令中执行此操作
回答by Johnsyweb
If your file looks just like that, then sedcan help you:
如果您的文件看起来像这样,那么sed可以帮助您:
sed -e 's/<[^>]*>//g' file.xml
Of course you should not use regular expressions for parsing XMLbecause it's hard.
回答by dogbane
Using awk:
使用 awk:
awk '{gsub(/<[^>]*>/,"")};1' file.xml
回答by Paused until further notice.
Give this a try:
试试这个:
grep -Po '<.*?>\K.*?(?=<.*?>)' inputfile
Explanation:
解释:
Using Perl Compatible Regular Expressions (-P) and outputting only the specified matches (-o):
使用 Perl 兼容正则表达式 ( -P) 并仅输出指定的匹配项 ( -o):
<.*?>- Non-greedy match of any characters within angle brackets\K- Don't include the preceding match in the output (reset match start - similar to positive look-behind, but it works with variable-length matches).*?- Non-greedy match stopping at the next match (this part will be output)(?=<.*?>)- Non-greedy match of any characters within angle brackets and don't include the match in the output (positive look-ahead - works with variable-length matches)
<.*?>- 尖括号内任何字符的非贪婪匹配\K- 不要在输出中包含前面的匹配(重置匹配开始 - 类似于正向后视,但它适用于可变长度匹配).*?- 非贪婪匹配停止在下一场比赛(这部分将被输出)(?=<.*?>)- 尖括号内任何字符的非贪婪匹配,并且不包括输出中的匹配(正向预测 - 适用于可变长度匹配)
回答by kenorb
回答by SielaQ
I know this is not a "perlgolf contest", but I used to use this trick.
我知道这不是“perlgolf 比赛”,但我曾经使用过这个技巧。
Set Record Separator for <or >, then print only odd lines:
为<or设置记录分隔符>,然后只打印奇数行:
awk -vRS='<|>' NR%2 file.xml

