bash awk 将双引号字符串视为一个标记并忽略其间的空格

Question

提问by Roy Chan

Data file - data.txt:

数据文件 - data.txt：

ABC "I am ABC" 35 DESC
DEF "I am not ABC" 42 DESC

cat data.txt | awk '{print $2}'

will result the "I" instead of the string being quoted

将导致“I”而不是被引用的字符串

How to make awk so that it ignore the space within the quote and think that it is one single token?

如何使 awk 忽略引号中的空格并认为它是一个单一的标记？

Answer 1

采纳答案by DigitalRoss

Yes, this can be done nicely in awk. It's easy to get all the fields without any serious hacks.

是的，这可以在 awk 中很好地完成。无需任何严重的黑客攻击即可轻松获取所有字段。

(This example works in both The One True Awkand in gawk.)

（这个例子在The One True Awk和 gawk 中都有效。）

{
  split(#!/bin/awk -f

BEGIN {
  FPAT = "([^ ]+)|(\"[^\"]+\")"
}
{
  print 
}
, a, "\"")
   = a[2]
   = $(NF - 1)
   = $NF
  print "and the fields are ", , "+", , "+", , "+", 
}

Answer 2

回答by mabalenk

Another alternative would be to use the FPATvariable, that defines a regular expression describing the contents of each field.

另一种选择是使用FPAT变量，它定义了一个描述每个字段内容的正则表达式。

Save this AWK script as parse.awk:

将此 AWK 脚本另存为parse.awk：

"I am ABC"
"I am not ABC"

Make it executable with chmod +x ./parse.awkand parse your data file as ./parse.awk data.txt:

使其可执行chmod +x ./parse.awk并将您的数据文件解析为./parse.awk data.txt：

$ cat data.txt | awk -F\" '{print }'
I am ABC
I am not ABC

Answer 3

回答by Chris Gregg

Try this:

尝试这个：

BEGIN { OFS = "" } {
    for (i = 1; i <= NF; i += 2) {
        gsub(/[ \t]+/, ",", $i)
    }
    print
}

Answer 4

回答by khh

The top answer for this question only works for lines with a single quoted field. When I found this question I needed something that could work for an arbitrary number of quoted fields.

此问题的最佳答案仅适用于具有单引号字段的行。当我发现这个问题时，我需要一些可以用于任意数量的引用字段的东西。

Eventually I came upon an answer by Wintermute in another thread, and he provided a good generalized solution to this problem. I've just modified it to remove the quotes. Note that you need to invoke awk with -F\"when running the below program.

最终我在另一个线程中找到了 Wintermute 的答案，他为这个问题提供了一个很好的通用解决方案。我刚刚修改了它以删除引号。请注意，您需要-F\"在运行以下程序时调用 awk 。

#!/usr/bin/gawk -f

# Resplit someText="ABC \"I am ABC\" 35 DESC '1 23' testing 456"
putItemsInLines() {
    local items=""
    local firstItem="true"
    while test $# -gt 0; do
        if [ "$firstItem" == "true" ]; then
            items=""
            firstItem="false"
        else
            items="$items
"
        fi
        shift
    done
    echo "$items"
}

count=0
while read -r valueLine; do
    echo "$count: $valueLine"
    count=$(( $count + 1 ))
done <<< "$(eval putItemsInLines $someText)"
 into array B. Spaces between double quotes are not separators.
# Single quotes not handled. No escaping of double quotes.
function resplit(       a, l, i, j, b, k, BNF) # all are local variables
{
  l=split(0: ABC
1: I am ABC
2: 35
3: DESC
4: 1 23
5: testing
6: 456
, a, "\"")
  BNF=0
  delete B
  for (i=1;i<=l;++i)
  {
    if (i % 2)
    {
      k=split(a[i], b)
      for (j=1;j<=k;++j)
        B[++BNF] = b[j]
    }
    else
    {
      B[++BNF] = "\""a[i]"\""
    }
  }
}

{
  resplit()

  for (i=1;i<=length(B);++i)
    print i ": " B[i]
}

This works by observing that every other element in the array will be inside of the quotes when you separate by the "-character, and so it replaces the whitespace dividing the ones not in quotes with a comma.

这是通过观察数组中的每个其他元素在用 "-character 分隔时都在引号内来工作的，因此它会替换空格，而不是用逗号分隔不在引号中的元素。

You can then easily chain another instance of awk to do whatever processing you need (just use the field separator switch again, -F,).

然后，您可以轻松地链接 awk 的另一个实例来执行您需要的任何处理（只需再次使用字段分隔符开关，-F,）。

Note that this might break if the first field is quoted - I haven't tested it. If it does, though, it should be easy to fix by adding an if statement to start at 2 rather than 1 if the first character of the line is a ".

请注意，如果引用第一个字段，这可能会中断 - 我还没有测试过。但是，如果是这样，如果该行的第一个字符是 ".

Answer 5

回答by arg0

I've scrunched up together a function that re-splits $0 into an array called B. Spaces between double quotes are not acting as field separators. Works with any number of fields, a mix of quoted and unquoted ones. Here goes:

我整理了一个函数，该函数将 $0 重新拆分为一个名为 B 的数组。双引号之间的空格不用作字段分隔符。适用于任意数量的字段，包括带引号和不带引号的字段。开始：

$ cat data.txt | awk -F\" '{print  ","  "," }' | awk -F' ,' '{print  "," }' | awk -F', ' '{print  "," }' | awk -F, '{print  ","  "," }'
ABC,I am ABC,35
DEF,I am not ABC,42

Hope it helps.

希望能帮助到你。

Answer 6

回答by bourne2program

Here is something like what I finally got working that is more generic for my project. Note it doesn't use awk.

这是我最终开始工作的东西，它对我的项目更通用。请注意，它不使用 awk。

##代码##

Which outputs:

哪些输出：

##代码##

Answer 7

回答by Chris Gregg

Okay, if you really want all three fields, you can get them, but it takes a lot of piping:

好吧，如果你真的想要所有三个字段，你可以得到它们，但是需要大量的管道：

##代码##

By the last pipe you've got all three fields to do whatever you'd like with.

到最后一个管道时，您已拥有所有三个字段，可以随心所欲地进行操作。

bash awk 将双引号字符串视为一个标记并忽略其间的空格

提问by Roy Chan

采纳答案by DigitalRoss

回答by mabalenk

回答by Chris Gregg

回答by khh

回答by arg0

回答by bourne2program

回答by Chris Gregg

相关推荐

最近更新

标签

bash awk 将双引号字符串视为一个标记并忽略其间的空格

提问by Roy Chan

采纳答案by DigitalRoss

回答by mabalenk

回答by Chris Gregg

回答by khh

回答by arg0

回答by bourne2program

回答by Chris Gregg

相关推荐

如何在 (OS X) bash 中以 YYYY-MM-DD 格式获取当前日期？

bash 如何在 shell 脚本中获取 INI 值？

bash 使用 grep 在文件中搜索十六进制字符串

为什么在可执行文件或脚本名称之前需要 ./（点斜线）才能在 bash 中运行它？

相关推荐

最近更新

标签