如何使用 bash 脚本从令牌之间的文件文本中提取
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4860228/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract from a file text between tokens using bash scripts
提问by Mr_LinDowsMac
I was reading this question: Extract lines between 2 tokens in a text file using bashbecause I have a very similar problem... I have to extract (and save it to $variable before printing) text in this xml file:
我正在阅读这个问题:使用 bash 在文本文件中的 2 个标记之间提取行,因为我有一个非常相似的问题......我必须在这个 xml 文件中提取(并在打印前将其保存到 $variable 中)文本:
<--more labels up this line>
<ExtraDataItem name="GUI/LastVMSelected" value="14cd3204-4774-46b8-be89-cc834efcba89"/>
<--more labels and text down this line-->
I only need to get the value= (obviously without brackets and no 'value='), but first, I think it have to search "GUI/LastVMSelected" to get to this line, because there could be a similar value field in other lines,and the value of that label is that i want.
我只需要获取 value=(显然没有括号也没有 'value='),但首先,我认为它必须搜索“GUI/LastVMSelected”才能到达这一行,因为其他地方可能有类似的值字段行,并且该标签的值是我想要的。
回答by Jan Hudec
If they are on the same line (as they seem to be from your example), it's even easier. Just:
如果它们在同一条线上(正如它们似乎来自您的示例),则更容易。只是:
sed -ne '/name="GUI\/LastVMSelected"/s/.*value="\([^"]*\)".*//p'
Explanation:
解释:
- -n: Suppress default print
- /name="GUI\/LastVMSelected"/: only lines matching this pattern
- s/.value="([^"])"./\1/p
- substitute everything, capturing the parenthesized part (the value of value)
- and print the result
- -n:禁止默认打印
- /name="GUI\/LastVMSelected"/: 仅匹配此模式的行
- 秒/。值="([^"])"。/\1/p
- 替换所有内容,捕获括号中的部分(值的值)
- 并打印结果
回答by Brian Agnew
I'm assuming that you're extracting from an XML document. If that is the case, have a look at the XMLStarletcommand-line tools for processing XML. There's some documentation for querying XML docs here.
我假设您是从 XML 文档中提取的。如果是这种情况,请查看用于处理 XML的XMLStarlet命令行工具。有查询XML文档的一些文件在这里。
回答by mpenkov
Use this:
用这个:
for f in `grep "GUI/LastVMSelected" filename.txt | cut -d " " -f3`; do echo ${f:7:36}; done
grepgets you only the lines you needcutsplits the lines using some separator, and returns the Nth result of the split-d " "sets the separator to space-f3returns the third result (1-based indexing)${f:7:36}extracts the substring starting at index 7 that is 36 characters long. This gets rid of the leadingvalue="and trailing slash, etc.
grep只为您提供所需的线路cut使用一些分隔符拆分行,并返回拆分的第 N 个结果-d " "将分隔符设置为空格-f3返回第三个结果(基于 1 的索引)${f:7:36}提取从索引 7 开始的 36 个字符长的子字符串。这摆脱了前导value="和尾随斜线等。
Obviously if the order of the fields changes, this will break, but if you're just after something quick and dirty that works, this should be it.
显然,如果字段的顺序发生变化,这将会中断,但是如果您只是在追求一些快速而肮脏的东西,那么应该就是这样。
回答by Paused until further notice.
Using my answer from the question you linked:
使用我从您链接的问题中得到的回答:
sed -n '/<!--more labels up this line-->/{:a;n;/<!--more labels and text down this line-->/b;\|GUI/LastVMSelected|s/value="\([^=]*\)"//p;ba}' inputfile
Explanation:
解释:
-n- don't do an implicit print/<!-- this is token 1 -->/{- if the starting marker is found, then:a- label "a"n- read the next line/<!-- this is token 2 -->/q- if it's the ending marker, quit\|GUI/LastVMSelected|- if the line matches the strings/value="\([^"]*\)"/\1/p- print the string after 'value=' and before the next quote
ba- branch to label "a"
}end if
-n- 不要隐式打印/<!-- this is token 1 -->/{- 如果找到起始标记,则:a- 标签“a”n- 阅读下一行/<!-- this is token 2 -->/q- 如果是结束标记,则退出\|GUI/LastVMSelected|- 如果该行与字符串匹配s/value="\([^"]*\)"/\1/p- 在 'value=' 之后和下一个引号之前打印字符串
ba- 分支到标签“a”
}万一

