bash 如何使用 awk 提取带引号的字段?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3458699/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 22:28:33  来源:igfitidea点击:

How to use awk to extract a quoted field?

linuxbashscriptingawk

提问by mmonem

I am using

我在用

awk '{ printf "%s",  }'

to extract some field from a space delimited line. Of course I get partial results when the field is quoted with free spaces inside. May any body suggest a solution please?

从空格分隔的行中提取一些字段。当然,当该字段用内部自由空格引用时,我会得到部分结果。任何机构都可以提出解决方案吗?

采纳答案by schot

This is actually quite difficult. I came up with the following awkscript that splits the line manually and stores all fields in an array.

这实际上是相当困难的。我想出了以下awk脚本,该脚本手动拆分行并将所有字段存储在数组中。

{
    s = 
$ cat file
field1 field2 "field 3" field4 "field5"

$ awk -F'"' '{for(i=2;i<=NF;i+=2) print $i}' file
field 3
field5
i = 0 split("", a) while ((m = match(s, /"[^"]*"/)) > 0) { # Add all unquoted fields before this field n = split(substr(s, 1, m - 1), t) for (j = 1; j <= n; j++) a[++i] = t[j] # Add this quoted field a[++i] = substr(s, RSTART + 1, RLENGTH - 2) s = substr(s, RSTART + RLENGTH) if (i >= 3) # We can stop once we have field 3 break } # Process the remaining unquoted fields after the last quoted field n = split(s, t) for (j = 1; j <= n; j++) a[++i] = t[j] print a[3] }

回答by ghostdog74

show your input file and desired output next time. To get quoted fields,

下次显示您的输入文件和所需的输出。要获取引用的字段,

# This function joins the fields $start to $stop together with FS, shifting
# subsequent fields down and updating NF.
#
function merge_fields(start, stop) {
    #printf "Merge fields $%d to $%d\n", start, stop;
    if (start >= stop)
        return;
    merged = "";
    for (i = start; i <= stop; i++) {
        if (merged)
            merged = merged OFS $i;
        else
            merged = $i;
    }
    $start = merged;

    offs = stop - start;
    for (i = start + 1; i <= NF; i++) {
        #printf "$%d = $%d\n", i, i+offs;
        $i = $(i + offs);
    }
    NF -= offs;
}

# Merge quoted fields together.
{
    start = stop = 0;
    for (i = 1; i <= NF; i++) {
        if (match($i, /^"/))
            start = i;
        if (match($i, /"$/))
            stop = i;
        if (start && stop && stop > start) {
            merge_fields(start, stop);
            # Start again from the beginning.
            i = 0;
            start = stop = 0;
        }
    }
}

# This rule executes after the one above. It sees the fields after merging.
{
    for (i = 1; i <= NF; i++) {
        printf "Field %d: >>>%s<<<\n", i, $i;
    }
}

回答by benj

Here's a possible alternative solution to this problem. It works by finding the fields that begin or end with quotes, and then joining those together. At the end it updates the fields and NF, so if you put more patterns after the one that does the merging, you can process the (new) fields using all the normal awk features.

这是此问题的可能替代解决方案。它的工作原理是查找以引号开头或结尾的字段,然后将它们连接在一起。最后它会更新字段和 NF,因此如果在进行合并的模式之后放置更多模式,则可以使用所有正常的 awk 功能处理(新)字段。

I think this uses only features of POSIX awk and doesn't rely on gawk extensions, but I'm not completely sure.

我认为这仅使用 POSIX awk 的功能,而不依赖于 gawk 扩展,但我不完全确定。

thing "more things" "thing" "more things and stuff"

On an input file like:

在输入文件上,如:

Field 1: >>>thing<<<
Field 2: >>>"more things"<<<
Field 3: >>>"thing"<<<
Field 4: >>>"more things and stuff"<<<

it produces:

它产生:

$ cat file
field1 field2 "field 3" field4 "field5"

awk -F"\"" '{print }' file

回答by Alan Swindells

If you are just looking for a specific field then

如果您只是在寻找特定领域,那么

##代码##

works. It splits the file by ", so the 2nd field in the example above is the one you want.

作品。它按“分割文件,因此上面示例中的第二个字段是您想要的字段。