Bash:用引号、逗号和换行符解析 CSV

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36287982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 14:25:47  来源:igfitidea点击:

Bash: Parse CSV with quotes, commas and newlines

bashcsvawkcutgawk

提问by Jacob Horbulyk

Say I have the following csv file:

假设我有以下 csv 文件:

 id,message,time
 123,"Sorry, This message
 has commas and newlines",2016-03-28T20:26:39
 456,"It makes the problem non-trivial",2016-03-28T20:26:41

I want to write a bash command that will return only the time column. i.e.

我想编写一个只返回时间列的 bash 命令。IE

time
2016-03-28T20:26:39
2016-03-28T20:26:41

What is the most straight forward way to do this? You can assume the availability of standard unix utils such as awk, gawk, cut, grep, etc.

什么是最直接的方法来做到这一点?您可以假设标准 unix 实用程序(例如 awk、gawk、cut、grep 等)的可用性。

Note the presence of "" which escape , and newline characters which make trivial attempts with

请注意转义的 "" 和进行微不足道的尝试的换行符的存在

cut -d , -f 3 file.csv

futile.

徒劳的。

回答by hek2mgl

As chepner said, you are encouraged to use a programming language which is able to parse csv.

正如chepner 所说,我们鼓励您使用能够解析csv的编程语言。

Here comes an example in python:

这是python中的一个例子:

import csv

with open('a.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile, quotechar='"')
    for row in reader:
        print(row[-1]) # row[-1] gives the last column

回答by SriniV

As said here

正如这里所说

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", 
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", 
time
2016-03-28T20:26:39
2016-03-28T20:26:41
, RT) }' file
, RT) }' file.csv \ | awk -F, '{print $NF}'

To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk(for RT):

要专门处理那些双引号字符串中的换行符,并保留它们之外的那些换行符,请使用GNU awk(for RT):

$ awk -F'"' '!(NF%2){getline remainder;
f:13.3 "System peripheral" "Intel Corporation" "Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" -r01 "Super Micro Computer Inc" "Device 0838"
=
# echo 'f:13.3 "System peripheral" "Intel Corporation" "Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" -r01 "Super Micro Computer Inc" "Device 0838"' | { eval array=($(cat)); declare -p array; }
declare -a array='([0]="f:13.3" [1]="System peripheral" [2]="Intel Corporation" [3]="Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" [4]="-r01" [5]="Super Micro Computer Inc" [6]="Device 0838")'
# 
OFS remainder} NR>1{sub(/,/,"",$NF); print $NF}' file 2016-03-28T20:26:39 2016-03-28T20:26:41

This works by splitting the file along "characters and removing newlines in every other block.

这是通过沿"字符拆分文件并在每个其他块中删除换行符来实现的。

Output

输出

sed -e 's/,/\n/g' file.csv | egrep ^201[0-9]-

Then use awk to split the columns and display the last column

然后使用 awk 拆分列并显示最后一列

回答by Aaron Digulla

CSV is a format which needs a proper parser (i.e. which can't be parsed with regular expressions alone). If you have Pythoninstalled, use the csvmoduleinstead of plain BASH.

CSV 是一种需要适当解析器的格式(即不能单独使用正则表达式解析)。如果您安装了Python,请使用该csv模块而不是普通的 BASH。

If not, consider csvkitwhich has a lot of powerful tools to process CSV files from the command line.

如果没有,请考虑csvkit,它有很多强大的工具可以从命令行处理 CSV 文件。

See also:

也可以看看:

回答by karakfa

another awkalternative using FS

awk使用 FS 的另一种选择

awk -F, '!/This/{print $NF}' file

time
2016-03-28T20:26:39
2016-03-28T20:26:41

回答by Brian Chrisman

I ran into something similar when attempting to deal with lspci -m output, but the embedded newlines would need to be escaped first (though IFS=, should work here, since it abuses bash' quote evaluation). Here's an example

我在尝试处理 lspci -m 输出时遇到了类似的问题,但需要首先转义嵌入的换行符(尽管 IFS=,应该在这里工作,因为它滥用了 bash 的报价评估)。这是一个例子

##代码##

And the only reasonable way I can find to bring that into bash is along the lines of:

我能找到的将其带入 bash 的唯一合理方法是:

##代码##

Not a full answer, but might help!

不是完整的答案,但可能会有所帮助!

回答by Eduardo

##代码##

回答by Claes Wikner

##代码##