bash awk 将双引号字符串视为一个标记并忽略其间的空格
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6619619/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Awk consider double quoted string as one token and ignore space in between
提问by Roy Chan
Data file - data.txt:
数据文件 - data.txt:
ABC "I am ABC" 35 DESC
DEF "I am not ABC" 42 DESC
cat data.txt | awk '{print $2}'
cat data.txt | awk '{print $2}'
will result the "I" instead of the string being quoted
将导致“I”而不是被引用的字符串
How to make awk so that it ignore the space within the quote and think that it is one single token?
如何使 awk 忽略引号中的空格并认为它是一个单一的标记?
采纳答案by DigitalRoss
Yes, this can be done nicely in awk. It's easy to get all the fields without any serious hacks.
是的,这可以在 awk 中很好地完成。无需任何严重的黑客攻击即可轻松获取所有字段。
(This example works in both The One True Awkand in gawk.)
(这个例子在The One True Awk和 gawk 中都有效。)
{
split(#!/bin/awk -f
BEGIN {
FPAT = "([^ ]+)|(\"[^\"]+\")"
}
{
print
}
, a, "\"")
= a[2]
= $(NF - 1)
= $NF
print "and the fields are ", , "+", , "+", , "+",
}
回答by mabalenk
Another alternative would be to use the FPAT
variable, that defines a regular expression describing the contents of each field.
另一种选择是使用FPAT
变量,它定义了一个描述每个字段内容的正则表达式。
Save this AWK script as parse.awk
:
将此 AWK 脚本另存为parse.awk
:
"I am ABC"
"I am not ABC"
Make it executable with chmod +x ./parse.awk
and parse your data file as ./parse.awk data.txt
:
使其可执行chmod +x ./parse.awk
并将您的数据文件解析为./parse.awk data.txt
:
$ cat data.txt | awk -F\" '{print }'
I am ABC
I am not ABC
回答by Chris Gregg
Try this:
尝试这个:
BEGIN { OFS = "" } {
for (i = 1; i <= NF; i += 2) {
gsub(/[ \t]+/, ",", $i)
}
print
}
回答by khh
The top answer for this question only works for lines with a single quoted field. When I found this question I needed something that could work for an arbitrary number of quoted fields.
此问题的最佳答案仅适用于具有单引号字段的行。当我发现这个问题时,我需要一些可以用于任意数量的引用字段的东西。
Eventually I came upon an answer by Wintermute in another thread, and he provided a good generalized solution to this problem. I've just modified it to remove the quotes. Note that you need to invoke awk with -F\"
when running the below program.
最终我在另一个线程中找到了 Wintermute 的答案,他为这个问题提供了一个很好的通用解决方案。我刚刚修改了它以删除引号。请注意,您需要-F\"
在运行以下程序时调用 awk 。
#!/usr/bin/gawk -f
# Resplit someText="ABC \"I am ABC\" 35 DESC '1 23' testing 456"
putItemsInLines() {
local items=""
local firstItem="true"
while test $# -gt 0; do
if [ "$firstItem" == "true" ]; then
items=""
firstItem="false"
else
items="$items
"
fi
shift
done
echo "$items"
}
count=0
while read -r valueLine; do
echo "$count: $valueLine"
count=$(( $count + 1 ))
done <<< "$(eval putItemsInLines $someText)"
into array B. Spaces between double quotes are not separators.
# Single quotes not handled. No escaping of double quotes.
function resplit( a, l, i, j, b, k, BNF) # all are local variables
{
l=split(0: ABC
1: I am ABC
2: 35
3: DESC
4: 1 23
5: testing
6: 456
, a, "\"")
BNF=0
delete B
for (i=1;i<=l;++i)
{
if (i % 2)
{
k=split(a[i], b)
for (j=1;j<=k;++j)
B[++BNF] = b[j]
}
else
{
B[++BNF] = "\""a[i]"\""
}
}
}
{
resplit()
for (i=1;i<=length(B);++i)
print i ": " B[i]
}
This works by observing that every other element in the array will be inside of the quotes when you separate by the "-character, and so it replaces the whitespace dividing the ones not in quotes with a comma.
这是通过观察数组中的每个其他元素在用 "-character 分隔时都在引号内来工作的,因此它会替换空格,而不是用逗号分隔不在引号中的元素。
You can then easily chain another instance of awk to do whatever processing you need (just use the field separator switch again, -F,
).
然后,您可以轻松地链接 awk 的另一个实例来执行您需要的任何处理(只需再次使用字段分隔符开关,-F,
)。
Note that this might break if the first field is quoted - I haven't tested it. If it does, though, it should be easy to fix by adding an if statement to start at 2 rather than 1 if the first character of the line is a ".
请注意,如果引用第一个字段,这可能会中断 - 我还没有测试过。但是,如果是这样,如果该行的第一个字符是 ".
回答by arg0
I've scrunched up together a function that re-splits $0 into an array called B. Spaces between double quotes are not acting as field separators. Works with any number of fields, a mix of quoted and unquoted ones. Here goes:
我整理了一个函数,该函数将 $0 重新拆分为一个名为 B 的数组。双引号之间的空格不用作字段分隔符。适用于任意数量的字段,包括带引号和不带引号的字段。开始:
$ cat data.txt | awk -F\" '{print "," "," }' | awk -F' ,' '{print "," }' | awk -F', ' '{print "," }' | awk -F, '{print "," "," }'
ABC,I am ABC,35
DEF,I am not ABC,42
Hope it helps.
希望能帮助到你。
回答by bourne2program
Here is something like what I finally got working that is more generic for my project. Note it doesn't use awk.
这是我最终开始工作的东西,它对我的项目更通用。请注意,它不使用 awk。
##代码##Which outputs:
哪些输出:
##代码##回答by Chris Gregg
Okay, if you really want all three fields, you can get them, but it takes a lot of piping:
好吧,如果你真的想要所有三个字段,你可以得到它们,但是需要大量的管道:
##代码##By the last pipe you've got all three fields to do whatever you'd like with.
到最后一个管道时,您已拥有所有三个字段,可以随心所欲地进行操作。