bash 我如何让 awk 不使用空格作为分隔符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19938195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 08:36:36  来源:igfitidea点击:

How do I get awk to NOT use space as a delimeter?

bashcsvawkspace

提问by vmos

I've got a CSV that I'm trying to process, but some of my fields contain commas, line breaks and spaces and now that I think about it, there's probably some apostrophes in there too.

我有一个我正在尝试处理的 CSV,但我的一些字段包含逗号、换行符和空格,现在我考虑了一下,那里可能也有一些撇号。

For the commas and line breaks, I've converted them to other strings at the output phase and convert them back at the end (yes it's messy but I only need to run this once) I realise that I may have to do this with the spaces too but I've broken the problem down to it's basic parts to see if I can work around it

对于逗号和换行符,我已在输出阶段将它们转换为其他字符串,并在最后将它们转换回(是的,这很混乱,但我只需要运行一次)我意识到我可能必须使用空间也是如此,但我已将问题分解为它的基本部分,看看我是否可以解决它

Here's an input.csv

这是一个 input.csv

"john","beatles.com","arse","[email protected]","1","1","on holiday"
"paul","beatles.com","bung","","0","1","also on holiday"

(I've tried with and without quotes)

(我试过带引号和不带引号)

here's the script

这是脚本

INPUT="input.csv"

for i in `cat ${INPUT}`

do
#USERNAME=`echo $i | awk -v  FS=',' '{print }'`
USERNAME=`echo $i | awk 'BEGIN{FS="[|,:]"} ; {print }'`
echo "username: $USERNAME"

done

So that should just input john and paul but instead I get

所以应该只输入 john 和 paul 但我得到

username: "john"
username: holiday"
username: "paul"
username: on
username: holiday"

because it sees the spaces and interprets them as new rows.

因为它看到空格并将它们解释为新行。

Can I get it to stop that?

我可以让它停止吗?

回答by devnull

It's not awk, but the shell (the default value of IFS) that's causing word splitting.

不是awk,而是IFS导致分词的外壳(的默认值)。

You could fix that by saying:

你可以通过说来解决这个问题:

while read -r i; do
  USERNAME=$(echo "$i" | awk 'BEGIN{FS="[|,:]"} ; {print }');
  echo "username: $USERNAME";
done < $INPUT


In order to verify how the shell is reading the input, add

为了验证 shell 如何读取输入,添加

echo "This is a line: ${i}"

in the loop.

在循环。

回答by anubhava

You can use any regex field separator in awk, eg using optional comma followed by double quote:

您可以在 awk 中使用任何正则表达式字段分隔符,例如使用可选的逗号后跟双引号:

awk -F ',?"' '{print , , , , , , "<"  ">"}' f1
john beatles.com arse [email protected] 1 1 <on holiday>
paul beatles.com bung  0 1 <also on holiday>

Enclose last field $14n < and >to showcase how it gets in a single awk variable.

将最后一个字段$14n括起来< and >以展示它如何进入单个 awk 变量。

回答by Timothy Brown

A few things to note, you don't need to use cator a forloop. Unless I am missing the bigger picture...

需要注意的几件事,您不需要使用catfor循环。除非我错过了更大的图景......

What happens when you call awk on the file?

当您在文件上调用 awk 时会发生什么?

awk -F"," '{print }' input.csv

I get the following:

我得到以下信息:

$ awk -F"," '{print }' input.csv
"john"
"paul"
$

回答by pobrelkey

An awk-free solution:

一个无 awk 的解决方案:

cut -d, -f1 input.csv | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done

The above assumes you want to keep the quotes. If not...

以上假设您要保留引号。如果不...

cut -d, -f1 input.csv | sed 's,^",,;s,"$,,' | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done

Both of the above also assume there are no commas in your field contents. If that's not true, use a "proper" CSV parser in your favorite scripting language. Example...

以上都假设您的字段内容中没有逗号。如果这不是真的,请使用您最喜欢的脚本语言中的“适当的”CSV 解析器。例子...

ruby -rcsv -ne 'puts CSV.parse_line($_)[0]' input.csv | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done