如何使用 bash 脚本从令牌之间的文件文本中提取

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4860228/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 23:22:18  来源:igfitidea点击:

How to extract from a file text between tokens using bash scripts

bashunixscripting

提问by Mr_LinDowsMac

I was reading this question: Extract lines between 2 tokens in a text file using bashbecause I have a very similar problem... I have to extract (and save it to $variable before printing) text in this xml file:

我正在阅读这个问题:使用 bash 在文本文件中的 2 个标记之间提取行,因为我有一个非常相似的问题......我必须在这个 xml 文件中提取(并在打印前将其保存到 $variable 中)文本:

<--more labels up this line>
<ExtraDataItem name="GUI/LastVMSelected" value="14cd3204-4774-46b8-be89-cc834efcba89"/>
<--more labels and text down this line-->

I only need to get the value= (obviously without brackets and no 'value='), but first, I think it have to search "GUI/LastVMSelected" to get to this line, because there could be a similar value field in other lines,and the value of that label is that i want.

我只需要获取 value=(显然没有括号也没有 'value='),但首先,我认为它必须搜索“GUI/LastVMSelected”才能到达这一行,因为其他地方可能有类似的值字段行,并且该标签的值是我想要的。

回答by Jan Hudec

If they are on the same line (as they seem to be from your example), it's even easier. Just:

如果它们在同一条线上(正如它们似乎来自您的示例),则更容易。只是:

sed -ne '/name="GUI\/LastVMSelected"/s/.*value="\([^"]*\)".*//p'

Explanation:

解释:

  • -n: Suppress default print
  • /name="GUI\/LastVMSelected"/: only lines matching this pattern
  • s/.value="([^"])"./\1/p
    • substitute everything, capturing the parenthesized part (the value of value)
    • and print the result
  • -n:禁止默认打印
  • /name="GUI\/LastVMSelected"/: 仅匹配此模式的行
  • 秒/。值="([^"])"。/\1/p
    • 替换所有内容,捕获括号中的部分(值的值)
    • 并打印结果

回答by Brian Agnew

I'm assuming that you're extracting from an XML document. If that is the case, have a look at the XMLStarletcommand-line tools for processing XML. There's some documentation for querying XML docs here.

我假设您是从 XML 文档中提取的。如果是这种情况,请查看用于处理 XML的XMLStarlet命令行工具。有查询XML文档的一些文件在这里

回答by mpenkov

Use this:

用这个:

for f in `grep "GUI/LastVMSelected" filename.txt | cut -d " " -f3`; do echo ${f:7:36}; done
  • grepgets you only the lines you need
  • cutsplits the lines using some separator, and returns the Nth result of the split
  • -d " "sets the separator to space
  • -f3returns the third result (1-based indexing)
  • ${f:7:36}extracts the substring starting at index 7 that is 36 characters long. This gets rid of the leading value="and trailing slash, etc.
  • grep只为您提供所需的线路
  • cut使用一些分隔符拆分行,并返回拆分的第 N 个结果
  • -d " "将分隔符设置为空格
  • -f3返回第三个结果(基于 1 的索引)
  • ${f:7:36}提取从索引 7 开始的 36 个字符长的子字符串。这摆脱了前导value="和尾随斜线等。

Obviously if the order of the fields changes, this will break, but if you're just after something quick and dirty that works, this should be it.

显然,如果字段的顺序发生变化,这将会中断,但是如果您只是在追求一些快速而肮脏的东西,那么应该就是这样。

回答by Paused until further notice.

Using my answer from the question you linked:

使用我从您链接的问题中得到的回答:

sed -n '/<!--more labels up this line-->/{:a;n;/<!--more labels and text down this line-->/b;\|GUI/LastVMSelected|s/value="\([^=]*\)"//p;ba}' inputfile

Explanation:

解释:

  • -n- don't do an implicit print
  • /<!-- this is token 1 -->/{- if the starting marker is found, then
    • :a- label "a"
      • n- read the next line
      • /<!-- this is token 2 -->/q- if it's the ending marker, quit
      • \|GUI/LastVMSelected|- if the line matches the string
        • s/value="\([^"]*\)"/\1/p- print the string after 'value=' and before the next quote
    • ba- branch to label "a"
  • }end if
  • -n- 不要隐式打印
  • /<!-- this is token 1 -->/{- 如果找到起始标记,则
    • :a- 标签“a”
      • n- 阅读下一行
      • /<!-- this is token 2 -->/q- 如果是结束标记,则退出
      • \|GUI/LastVMSelected|- 如果该行与字符串匹配
        • s/value="\([^"]*\)"/\1/p- 在 'value=' 之后和下一个引号之前打印字符串
    • ba- 分支到标签“a”
  • }万一