Bash：用引号、逗号和换行符解析 CSV

Question

提问by Jacob Horbulyk

Say I have the following csv file:

假设我有以下 csv 文件：

 id,message,time
 123,"Sorry, This message
 has commas and newlines",2016-03-28T20:26:39
 456,"It makes the problem non-trivial",2016-03-28T20:26:41

I want to write a bash command that will return only the time column. i.e.

我想编写一个只返回时间列的 bash 命令。IE

time
2016-03-28T20:26:39
2016-03-28T20:26:41

What is the most straight forward way to do this? You can assume the availability of standard unix utils such as awk, gawk, cut, grep, etc.

什么是最直接的方法来做到这一点？您可以假设标准 unix 实用程序（例如 awk、gawk、cut、grep 等）的可用性。

Note the presence of "" which escape , and newline characters which make trivial attempts with

请注意转义的 "" 和进行微不足道的尝试的换行符的存在

cut -d , -f 3 file.csv

futile.

徒劳的。

Answer 1

回答by hek2mgl

As chepner said, you are encouraged to use a programming language which is able to parse csv.

正如chepner 所说，我们鼓励您使用能够解析csv的编程语言。

Here comes an example in python:

这是python中的一个例子：

import csv

with open('a.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile, quotechar='"')
    for row in reader:
        print(row[-1]) # row[-1] gives the last column

Answer 2

回答by SriniV

As said here

正如这里所说

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", time
2016-03-28T20:26:39
2016-03-28T20:26:41
, RT) }' file
, RT) }' file.csv \
 | awk -F, '{print $NF}'

To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk(for RT):

要专门处理那些双引号字符串中的换行符，并保留它们之外的那些换行符，请使用GNU awk(for RT)：

$ awk -F'"' '!(NF%2){getline remainder;f:13.3 "System peripheral" "Intel Corporation" "Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" -r01 "Super Micro Computer Inc" "Device 0838"
=# echo 'f:13.3 "System peripheral" "Intel Corporation" "Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" -r01 "Super Micro Computer Inc" "Device 0838"' | { eval array=($(cat)); declare -p array; }
declare -a array='([0]="f:13.3" [1]="System peripheral" [2]="Intel Corporation" [3]="Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" [4]="-r01" [5]="Super Micro Computer Inc" [6]="Device 0838")'
# 
 OFS remainder}
                NR>1{sub(/,/,"",$NF); print $NF}' file

2016-03-28T20:26:39
2016-03-28T20:26:41

This works by splitting the file along "characters and removing newlines in every other block.

这是通过沿"字符拆分文件并在每个其他块中删除换行符来实现的。

Output

输出

sed -e 's/,/\n/g' file.csv | egrep ^201[0-9]-

Then use awk to split the columns and display the last column

然后使用 awk 拆分列并显示最后一列

Answer 3

回答by Aaron Digulla

CSV is a format which needs a proper parser (i.e. which can't be parsed with regular expressions alone). If you have Pythoninstalled, use the csvmoduleinstead of plain BASH.

CSV 是一种需要适当解析器的格式（即不能单独使用正则表达式解析）。如果您安装了Python，请使用该csv模块而不是普通的 BASH。

If not, consider csvkitwhich has a lot of powerful tools to process CSV files from the command line.

如果没有，请考虑csvkit，它有很多强大的工具可以从命令行处理 CSV 文件。

回答by karakfa

another awkalternative using FS

awk使用 FS 的另一种选择

awk -F, '!/This/{print $NF}' file

time
2016-03-28T20:26:39
2016-03-28T20:26:41

Answer 5

回答by Brian Chrisman

I ran into something similar when attempting to deal with lspci -m output, but the embedded newlines would need to be escaped first (though IFS=, should work here, since it abuses bash' quote evaluation). Here's an example

我在尝试处理 lspci -m 输出时遇到了类似的问题，但需要首先转义嵌入的换行符（尽管 IFS=，应该在这里工作，因为它滥用了 bash 的报价评估）。这是一个例子

##代码##

And the only reasonable way I can find to bring that into bash is along the lines of:

我能找到的将其带入 bash 的唯一合理方法是：

##代码##

Not a full answer, but might help!

不是完整的答案，但可能会有所帮助！

Answer 6

回答by Eduardo

##代码##

Answer 7

回答by Claes Wikner

##代码##

Bash：用引号、逗号和换行符解析 CSV

提问by Jacob Horbulyk

回答by hek2mgl

回答by SriniV

回答by Aaron Digulla

回答by karakfa

回答by Brian Chrisman

回答by Eduardo

回答by Claes Wikner

相关推荐

最近更新

标签

Bash：用引号、逗号和换行符解析 CSV

提问by Jacob Horbulyk

回答by hek2mgl

回答by SriniV

回答by Aaron Digulla

回答by karakfa

回答by Brian Chrisman

回答by Eduardo

回答by Claes Wikner

相关推荐

bash 如何使用 PlistBuddy 将多个条目添加到 plist 字典

bash 使用bash提取没有标签的网页源

在 Bash 中使用“set -e”时如何捕获 ERR

在 awk 打印中使用 bash for 循环变量

相关推荐

最近更新

标签