如何在 Linux 命令行上解析 CSV 文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1063125/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I parse CSV files on the Linux command line?
提问by Joel
How can I parse CSV files on the Linux command line?
如何在 Linux 命令行上解析 CSV 文件?
To do things like:
做这样的事情:
csvparse -c 2,5,6 filename
to extract fields from columns 2, 5 and 6 from all rows.
从所有行的第 2、5 和 6 列中提取字段。
It should be able to handle the csv file format: http://tools.ietf.org/html/rfc4180which means quoting fields and escaping inner quotes as appropriate, so for an example row with 3 fields:
它应该能够处理 csv 文件格式:http: //tools.ietf.org/html/rfc4180这意味着适当引用字段并转义内部引号,因此对于具有 3 个字段的示例行:
field1,"field, number ""2"", has inner quotes and a comma",field3
so that if I request field 2 for the row above I get:
因此,如果我为上面的行请求字段 2,我会得到:
field, number "2", has inner quotes and a comma
I appreciate that there are numerous solutions, Perl, Awk (etc.) to this problem but I would like a native bash command line tool that does not require me to invoke some other scripting environment or write any additional code(!).
我很欣赏这个问题有很多解决方案,Perl、Awk(等等),但我想要一个本机 bash 命令行工具,它不需要我调用其他脚本环境或编写任何额外的代码(!)。
采纳答案by RobS
回答by samoz
This sounds like a job for awk.
这听起来像是 awk 的工作。
You will most likely need to write your own script for your specific needs, but thissite has some dialogue about how to go about doing this.
您很可能需要根据自己的特定需求编写自己的脚本,但该站点有一些关于如何执行此操作的对话。
You could also use the cut utility to strip the fields out.
您还可以使用 cut 实用程序去除字段。
Something like:
就像是:
cut -f 2,5,6 -d , filename
where the -f argument is the field you want and -d is the delimeter you want. You could then sort these results, find the unique ones, or use any other bash utility. There is a cool video hereabout working with CSV files from the command line. Only about a minute, I'd take a look.
其中 -f 参数是您想要的字段, -d 是您想要的分隔符。然后,您可以对这些结果进行排序,找到唯一的结果,或使用任何其他 bash 实用程序。有一个很酷的视频在这里关于使用CSV文件,从工作的命令行。大约一分钟,我去看看。
However, I guess you could group the cut utility with awk and not want to use it. I don't really know what exactly you mean by native bash command though, so I'll still suggest it.
但是,我想您可以将 cut 实用程序与 awk 组合在一起而不想使用它。我真的不知道你所说的原生 bash 命令究竟是什么意思,所以我仍然会建议它。
回答by RobS
A quick google reveals an awk scriptthat seems to handle csv files.
一个快速的谷歌揭示了一个似乎处理 csv 文件的awk 脚本。
回答by Jeremy Cantrell
My gut reaction would be to write a script wrapper around Python's csv module (if there isn't already such a thing).
我的直觉反应是围绕 Python 的 csv 模块编写一个脚本包装器(如果还没有这样的东西)。
回答by jmanning2k
Try crush-tools, they are great at manipulating delimited data. It sounds like exactly what you're looking for.
尝试一下crash-tools,它们非常擅长处理带分隔符的数据。这听起来正是你要找的。
回答by Archie
回答by dfrankow
Perl script (requires Text::CSV_XS):
Perl 脚本(需要 Text::CSV_XS):
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long;
my @opt_columns;
GetOptions("column=i@" => \@opt_columns)
or die "Failed parsing options\n";
die "Must give at least one --column\n" if int(@opt_columns) == 0;
@opt_columns = map { $_-1 } @opt_columns; # convert 1-based to 0-based
use Text::CSV_XS;
my $csv = Text::CSV_XS->new ( { binary => 1 } );
open(my $stdin, "<-") or die "Couldn't open stdin\n";
open(my $stdout, ">-") or die "Couldn't open stdout\n";
while (my $row = $csv->getline($stdin)) {
my @nrow = @{$row}[@opt_columns];
$csv->print($stdout, \@nrow);
print "\n";
}
Put it in a file csvcut.pl
.
把它放在一个文件中csvcut.pl
。
Example of taking only columns 3 and 4:
仅采用第 3 列和第 4 列的示例:
cat foo.csv | ./csvcut.pl --c 3 --c 4
This will only quote columns that need quoting, so if an input column has "Bar" (with quotes) it will come out Bar (without quotes).
这只会引用需要引用的列,因此如果输入列有“Bar”(带引号),它将出现 Bar(不带引号)。
回答by Lari Hotari
csvtool is really good. Available in Debian / Ubuntu (apt-get install csvtool
). Example:
csvtool 真的很好。在 Debian / Ubuntu ( apt-get install csvtool
) 中可用。例子:
csvtool namedcol Account,Cost input.csv > output.csv
See the CSVTool manual pagefor usage tips.
有关使用提示,请参阅CSVTool 手册页。
回答by cjg
I found csvkitto be useful, it is based on python csvmodule and has quite a lot of options for parsing complex csv files.
我发现csvkit很有用,它基于 python csv模块,并且有很多用于解析复杂 csv 文件的选项。
Although it seems to be a bit slow. I am getting 4MB/s (with 100% cpu) when extracting one field from a 7GB csv with 5 columns.
虽然好像有点慢。从 5 列的 7GB csv 中提取一个字段时,我得到 4MB/s(100% cpu)。
To extract 4th column from file.csv
从中提取第 4 列 file.csv
csvcut -c 4 file.csv