如何使用 bash 脚本从令牌之间的文件文本中提取

Question

提问by Mr_LinDowsMac

I was reading this question: Extract lines between 2 tokens in a text file using bashbecause I have a very similar problem... I have to extract (and save it to $variable before printing) text in this xml file:

我正在阅读这个问题：使用 bash 在文本文件中的 2 个标记之间提取行，因为我有一个非常相似的问题......我必须在这个 xml 文件中提取（并在打印前将其保存到 $variable 中）文本：

<--more labels up this line>
<ExtraDataItem name="GUI/LastVMSelected" value="14cd3204-4774-46b8-be89-cc834efcba89"/>
<--more labels and text down this line-->

I only need to get the value= (obviously without brackets and no 'value='), but first, I think it have to search "GUI/LastVMSelected" to get to this line, because there could be a similar value field in other lines,and the value of that label is that i want.

我只需要获取 value=（显然没有括号也没有 'value='），但首先，我认为它必须搜索“GUI/LastVMSelected”才能到达这一行，因为其他地方可能有类似的值字段行，并且该标签的值是我想要的。

Answer 1

回答by Jan Hudec

If they are on the same line (as they seem to be from your example), it's even easier. Just:

如果它们在同一条线上（正如它们似乎来自您的示例），则更容易。只是：

sed -ne '/name="GUI\/LastVMSelected"/s/.*value="\([^"]*\)".*//p'

Explanation:

解释：

-n: Suppress default print
/name="GUI\/LastVMSelected"/: only lines matching this pattern
s/.value="([^"])"./\1/p
- substitute everything, capturing the parenthesized part (the value of value)
- and print the result

-n：禁止默认打印
/name="GUI\/LastVMSelected"/: 仅匹配此模式的行
秒/。值="([^"])"。/\1/p
- 替换所有内容，捕获括号中的部分（值的值）
- 并打印结果

Answer 2

回答by Brian Agnew

I'm assuming that you're extracting from an XML document. If that is the case, have a look at the XMLStarletcommand-line tools for processing XML. There's some documentation for querying XML docs here.

我假设您是从 XML 文档中提取的。如果是这种情况，请查看用于处理 XML的XMLStarlet命令行工具。有查询XML文档的一些文件在这里。

Answer 3

回答by mpenkov

Use this:

用这个：

for f in `grep "GUI/LastVMSelected" filename.txt | cut -d " " -f3`; do echo ${f:7:36}; done

grepgets you only the lines you need
cutsplits the lines using some separator, and returns the Nth result of the split
-d " "sets the separator to space
-f3returns the third result (1-based indexing)
${f:7:36}extracts the substring starting at index 7 that is 36 characters long. This gets rid of the leading value="and trailing slash, etc.

grep只为您提供所需的线路
cut使用一些分隔符拆分行，并返回拆分的第 N 个结果
-d " "将分隔符设置为空格
-f3返回第三个结果（基于 1 的索引）
${f:7:36}提取从索引 7 开始的 36 个字符长的子字符串。这摆脱了前导value="和尾随斜线等。

Obviously if the order of the fields changes, this will break, but if you're just after something quick and dirty that works, this should be it.

显然，如果字段的顺序发生变化，这将会中断，但是如果您只是在追求一些快速而肮脏的东西，那么应该就是这样。

Answer 4

回答by Paused until further notice.

Using my answer from the question you linked:

使用我从您链接的问题中得到的回答：

sed -n '/<!--more labels up this line-->/{:a;n;/<!--more labels and text down this line-->/b;\|GUI/LastVMSelected|s/value="\([^=]*\)"//p;ba}' inputfile

Explanation:

解释：

-n- don't do an implicit print
//{- if the starting marker is found, then
- :a- label "a"
  - n- read the next line
  - //q- if it's the ending marker, quit
  - \|GUI/LastVMSelected|- if the line matches the string
    - s/value="$[^"]*$"/\1/p- print the string after 'value=' and before the next quote
- ba- branch to label "a"
}end if

-n- 不要隐式打印
//{- 如果找到起始标记，则
- :a- 标签“a”
  - n- 阅读下一行
  - //q- 如果是结束标记，则退出
  - \|GUI/LastVMSelected|- 如果该行与字符串匹配
    - s/value="$[^"]*$"/\1/p- 在 'value=' 之后和下一个引号之前打印字符串
- ba- 分支到标签“a”
}万一

如何使用 bash 脚本从令牌之间的文件文本中提取

提问by Mr_LinDowsMac

回答by Jan Hudec

回答by Brian Agnew

回答by mpenkov

回答by Paused until further notice.

相关推荐

最近更新

标签

如何使用 bash 脚本从令牌之间的文件文本中提取

提问by Mr_LinDowsMac

回答by Jan Hudec

回答by Brian Agnew

回答by mpenkov

回答by Paused until further notice.

相关推荐

在 PHP 中使用 bash shell

bash 需要一个删除除 *.pdf 之外的所有文件的 shell 脚本

LINUX：列出所有目录，推入一个 bash 数组

bash 和 readline：用户输入循环中的选项卡完成？

相关推荐

最近更新

标签