如何在 Linux 命令行上解析 CSV 文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1063125/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 17:28:40  来源:igfitidea点击:

How can I parse CSV files on the Linux command line?

linuxunixcsv

提问by Joel

How can I parse CSV files on the Linux command line?

如何在 Linux 命令行上解析 CSV 文件?

To do things like:

做这样的事情:

csvparse -c 2,5,6 filename

to extract fields from columns 2, 5 and 6 from all rows.

从所有行的第 2、5 和 6 列中提取字段。

It should be able to handle the csv file format: http://tools.ietf.org/html/rfc4180which means quoting fields and escaping inner quotes as appropriate, so for an example row with 3 fields:

它应该能够处理 csv 文件格式:http: //tools.ietf.org/html/rfc4180这意味着适当引用字段并转义内部引号,因此对于具有 3 个字段的示例行:

field1,"field, number ""2"", has inner quotes and a comma",field3

so that if I request field 2 for the row above I get:

因此,如果我为上面的行请求字段 2,我会得到:

field, number "2", has inner quotes and a comma

I appreciate that there are numerous solutions, Perl, Awk (etc.) to this problem but I would like a native bash command line tool that does not require me to invoke some other scripting environment or write any additional code(!).

我很欣赏这个问题有很多解决方案,Perl、Awk(等等),但我想要一个本机 bash 命令行工具,它不需要我调用其他脚本环境或编写任何额外的代码(!)。

采纳答案by RobS

My FOSS CSV stream editor CSVfixdoes exactly what you want. There is a binary installer for Windows, and a compilable version (via a makefile) for UNIX/Linux.

我的 FOSS CSV 流编辑器CSVfix完全符合您的要求。有一个用于 Windows 的二进制安装程序,以及一个用于 UNIX/Linux 的可编译版本(通过 makefile)。

回答by samoz

This sounds like a job for awk.

这听起来像是 awk 的工作。

You will most likely need to write your own script for your specific needs, but thissite has some dialogue about how to go about doing this.

您很可能需要根据自己的特定需求编写自己的脚本,但站点有一些关于如何执行此操作的对话。

You could also use the cut utility to strip the fields out.

您还可以使用 cut 实用程序去除字段。

Something like:

就像是:

cut -f 2,5,6 -d , filename

where the -f argument is the field you want and -d is the delimeter you want. You could then sort these results, find the unique ones, or use any other bash utility. There is a cool video hereabout working with CSV files from the command line. Only about a minute, I'd take a look.

其中 -f 参数是您想要的字段, -d 是您想要的分隔符。然后,您可以对这些结果进行排序,找到唯一的结果,或使用任何其他 bash 实用程序。有一个很酷的视频在这里关于使用CSV文件,从工作的命令行。大约一分钟,我去看看。

However, I guess you could group the cut utility with awk and not want to use it. I don't really know what exactly you mean by native bash command though, so I'll still suggest it.

但是,我想您可以将 cut 实用程序与 awk 组合在一起而不想使用它。我真的不知道你所说的原生 bash 命令究竟是什么意思,所以我仍然会建议它。

回答by RobS

A quick google reveals an awk scriptthat seems to handle csv files.

一个快速的谷歌揭示了一个似乎处理 csv 文件的awk 脚本

回答by Jeremy Cantrell

My gut reaction would be to write a script wrapper around Python's csv module (if there isn't already such a thing).

我的直觉反应是围绕 Python 的 csv 模块编写一个脚本包装器(如果还没有这样的东西)。

回答by jmanning2k

Try crush-tools, they are great at manipulating delimited data. It sounds like exactly what you're looking for.

尝试一下crash-tools,它们非常擅长处理带分隔符的数据。这听起来正是你要找的。

回答by Archie

I wrote one of these tools too (UNIX only) called csvprintf. It can also converts to XML in an online fashion.

我也编写了其中一种工具(仅限 UNIX),称为csvprintf。它还可以在线方式转换为 XML。

回答by dfrankow

Perl script (requires Text::CSV_XS):

Perl 脚本(需要 Text::CSV_XS):

#!/usr/bin/perl

use strict;
use warnings;

use Getopt::Long;
my @opt_columns;
GetOptions("column=i@" => \@opt_columns)
  or die "Failed parsing options\n";
die "Must give at least one --column\n" if int(@opt_columns) == 0;
@opt_columns = map { $_-1 } @opt_columns; # convert 1-based to 0-based

use Text::CSV_XS;
my $csv = Text::CSV_XS->new ( { binary => 1 } );

open(my $stdin, "<-") or die "Couldn't open stdin\n";
open(my $stdout, ">-") or die "Couldn't open stdout\n";
while (my $row = $csv->getline($stdin)) {
    my @nrow = @{$row}[@opt_columns];
    $csv->print($stdout, \@nrow);
    print "\n";
}

Put it in a file csvcut.pl.

把它放在一个文件中csvcut.pl

Example of taking only columns 3 and 4:

仅采用第 3 列和第 4 列的示例:

cat foo.csv | ./csvcut.pl --c 3 --c 4

This will only quote columns that need quoting, so if an input column has "Bar" (with quotes) it will come out Bar (without quotes).

这只会引用需要引用的列,因此如果输入列有“Bar”(带引号),它将出现 Bar(不带引号)。

回答by Lari Hotari

csvtool is really good. Available in Debian / Ubuntu (apt-get install csvtool). Example:

csvtool 真的很好。在 Debian / Ubuntu ( apt-get install csvtool) 中可用。例子:

csvtool namedcol Account,Cost input.csv > output.csv

See the CSVTool manual pagefor usage tips.

有关使用提示,请参阅CSVTool 手册页

回答by cjg

I found csvkitto be useful, it is based on python csvmodule and has quite a lot of options for parsing complex csv files.

我发现csvkit很有用,它基于 python csv模块,并且有很多用于解析复杂 csv 文件的选项。

Although it seems to be a bit slow. I am getting 4MB/s (with 100% cpu) when extracting one field from a 7GB csv with 5 columns.

虽然好像有点慢。从 5 列的 7GB csv 中提取一个字段时,我得到 4MB/s(100% cpu)。

To extract 4th column from file.csv

从中提取第 4 列 file.csv

csvcut -c 4 file.csv

回答by philadams

For a super lightweight wrapper around Python's csvmodule, you could look at pluckr.

对于 Pythoncsv模块的超轻量级包装器,您可以查看pluckr