bash 我如何让 awk 不使用空格作为分隔符？

Question

提问by vmos

I've got a CSV that I'm trying to process, but some of my fields contain commas, line breaks and spaces and now that I think about it, there's probably some apostrophes in there too.

我有一个我正在尝试处理的 CSV，但我的一些字段包含逗号、换行符和空格，现在我考虑了一下，那里可能也有一些撇号。

For the commas and line breaks, I've converted them to other strings at the output phase and convert them back at the end (yes it's messy but I only need to run this once) I realise that I may have to do this with the spaces too but I've broken the problem down to it's basic parts to see if I can work around it

对于逗号和换行符，我已在输出阶段将它们转换为其他字符串，并在最后将它们转换回（是的，这很混乱，但我只需要运行一次）我意识到我可能必须使用空间也是如此，但我已将问题分解为它的基本部分，看看我是否可以解决它

Here's an input.csv

这是一个 input.csv

"john","beatles.com","arse","[email protected]","1","1","on holiday"
"paul","beatles.com","bung","","0","1","also on holiday"

(I've tried with and without quotes)

（我试过带引号和不带引号）

here's the script

这是脚本

INPUT="input.csv"

for i in `cat ${INPUT}`

do
#USERNAME=`echo $i | awk -v  FS=',' '{print }'`
USERNAME=`echo $i | awk 'BEGIN{FS="[|,:]"} ; {print }'`
echo "username: $USERNAME"

done

So that should just input john and paul but instead I get

所以应该只输入 john 和 paul 但我得到

username: "john"
username: holiday"
username: "paul"
username: on
username: holiday"

because it sees the spaces and interprets them as new rows.

因为它看到空格并将它们解释为新行。

Can I get it to stop that?

我可以让它停止吗？

Answer 1

回答by devnull

It's not awk, but the shell (the default value of IFS) that's causing word splitting.

不是awk，而是IFS导致分词的外壳（的默认值）。

You could fix that by saying:

你可以通过说来解决这个问题：

while read -r i; do
  USERNAME=$(echo "$i" | awk 'BEGIN{FS="[|,:]"} ; {print }');
  echo "username: $USERNAME";
done < $INPUT

In order to verify how the shell is reading the input, add

为了验证 shell 如何读取输入，添加

echo "This is a line: ${i}"

in the loop.

在循环。

Answer 2

回答by anubhava

You can use any regex field separator in awk, eg using optional comma followed by double quote:

您可以在 awk 中使用任何正则表达式字段分隔符，例如使用可选的逗号后跟双引号：

awk -F ',?"' '{print , , , , , , "<"  ">"}' f1
john beatles.com arse [email protected] 1 1 <on holiday>
paul beatles.com bung  0 1 <also on holiday>

Enclose last field $14n < and >to showcase how it gets in a single awk variable.

将最后一个字段$14n括起来< and >以展示它如何进入单个 awk 变量。

Answer 3

回答by Timothy Brown

A few things to note, you don't need to use cator a forloop. Unless I am missing the bigger picture...

需要注意的几件事，您不需要使用cat或for循环。除非我错过了更大的图景......

What happens when you call awk on the file?

当您在文件上调用 awk 时会发生什么？

awk -F"," '{print }' input.csv

I get the following:

我得到以下信息：

$ awk -F"," '{print }' input.csv
"john"
"paul"
$

Answer 4

回答by pobrelkey

An awk-free solution:

一个无 awk 的解决方案：

cut -d, -f1 input.csv | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done

The above assumes you want to keep the quotes. If not...

以上假设您要保留引号。如果不...

cut -d, -f1 input.csv | sed 's,^",,;s,"$,,' | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done

Both of the above also assume there are no commas in your field contents. If that's not true, use a "proper" CSV parser in your favorite scripting language. Example...

以上都假设您的字段内容中没有逗号。如果这不是真的，请使用您最喜欢的脚本语言中的“适当的”CSV 解析器。例子...

ruby -rcsv -ne 'puts CSV.parse_line($_)[0]' input.csv | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done

bash 我如何让 awk 不使用空格作为分隔符？

提问by vmos

回答by devnull

回答by anubhava

回答by Timothy Brown

回答by pobrelkey

相关推荐

最近更新

标签

bash 我如何让 awk 不使用空格作为分隔符？

提问by vmos

回答by devnull

回答by anubhava

回答by Timothy Brown

回答by pobrelkey

相关推荐

bash Linux top -b 仅显示特定列

bash SED：如何将字符串插入到最后一行的开头

在 bash 中为文本中的每一行添加前缀

bash .bashrc 命令脚本中的 Term::ReadKey

相关推荐

最近更新

标签