bash 如何提取csv文件的一列

Question

提问by user788171

If I have a csv file, is there a quick bash way to print out the contents of only any single column? It is safe to assume that each row has the same number of columns, but each column's content would have different length.

如果我有一个 csv 文件，是否有一种快速的 bash 方法可以只打印出任何一列的内容？可以安全地假设每行具有相同的列数，但每列的内容将具有不同的长度。

Answer 1

回答by synthesizerpatel

You could use awk for this. Change '$2' to the nth column you want.

您可以为此使用 awk。将 '$2' 更改为您想要的第 n 列。

awk -F "\"*,\"*" '{print }' textfile.csv

Answer 2

回答by madrag

yes. cat mycsv.csv | cut -d ',' -f3will print 3rd column.

是的。cat mycsv.csv | cut -d ',' -f3将打印第三列。

Answer 3

回答by Samar

The simplest way I was able to get this done was to just use csvtool. I had other use cases as well to use csvtool and it can handle the quotes or delimiters appropriately if they appear within the column data itself.

我能够完成这项工作的最简单方法是使用csvtool。我还有其他用例来使用 csvtool，如果它们出现在列数据本身中，它可以适当地处理引号或分隔符。

csvtool format '%(2)\n' input.csv

Replacing 2 with the column number will effectively extract the column data you are looking for.

用列号替换 2 将有效地提取您要查找的列数据。

Answer 4

回答by cevaris

Landed here looking to extract from a tab separated file. Thought I would add.

降落在这里，希望从制表符分隔的文件中提取。以为我会补充。

cat textfile.tsv | cut -f2 -s

Where -f2extracts the 2, non-zero indexed column, or the second column.

Where-f2提取第 2 个非零索引列或第二列。

Answer 5

回答by router

Many answers for this questions are great and some have even looked into the corner cases. I would like to add a simple answer that can be of daily use... where you mostly get into those corner cases (like having escaped commas or commas in quotes etc.,).

这个问题的许多答案都很好，有些甚至研究了极端情况。我想添加一个可以日常使用的简单答案......在那里你大多会遇到那些极端情况（比如转义逗号或引号中的逗号等）。

FS (Field Separator) is the variable whose value is dafaulted to space. So awk by default splits at space for any line.

FS (Field Separator) 是其值受空格影响的变量。所以 awk 默认情况下在任何行的空格处拆分。

So using BEGIN (Execute before taking input) we can set this field to anything we want...

因此，使用 BEGIN（在接受输入之前执行）我们可以将此字段设置为我们想要的任何内容...

awk 'BEGIN {FS = ","}; {print }'

The above code will print the 3rd column in a csv file.

上面的代码将打印 csv 文件中的第三列。

Answer 6

回答by jarno

You could use GNU Awk, see this article of the user guide. As an improvement to the solution presented in the article (in June 2015), the following gawk command allows double quotes inside double quoted fields; a double quote is marked by two consecutive double quotes ("") there. Furthermore, this allows empty fields, but even this can not handle multiline fields. The following example prints the 3rd column (via c=3) of textfile.csv:

您可以使用 GNU Awk，请参阅用户指南的这篇文章。作为对文章（2015 年 6 月）中提出的解决方案的改进，以下 gawk 命令允许在双引号字段内使用双引号；双引号在那里由两个连续的双引号 ("") 标记。此外，这允许空字段，但即使这样也无法处理多行字段。以下示例打印c=3textfile.csv的第 3 列（via ）：

#!/bin/bash
gawk -- '
BEGIN{
    FPAT="([^,\"]*)|(\"((\"\")*[^\"]*)*\")"
}
{
    if (substr($c, 1, 1) == "\"") {
        $c = substr($c, 2, length($c) - 2) # Get the text within the two quotes
        gsub("\"\"", "\"", $c)  # Normalize double quotes
    }
    print $c
}
' c=3 < <(dos2unix <textfile.csv)

Note the use of dos2unixto convert possible DOS style line breaks (CRLF i.e. "\r\n") and UTF-16 encoding (with byte order mark) to "\n" and UTF-8 (without byte order mark), respectively. Standard CSV files use CRLF as line break, see Wikipedia.

请注意dos2unix将可能的 DOS 样式换行符（CRLF，即“\r\n”）和 UTF-16 编码（带字节顺序标记）分别转换为“\n”和 UTF-8（不带字节顺序标记）的用法。标准 CSV 文件使用 CRLF 作为换行符，请参阅Wikipedia。

If the input may contain multiline fields, you can use the following script.Note the use of special string for separating records in output (since the default separator newline could occur within a record). Again, the following example prints the 3rd column (via c=3) of textfile.csv:

如果输入可能包含多行字段，您可以使用以下脚本。请注意使用特殊字符串在输出中分隔记录（因为默认分隔符换行符可能出现在记录中）。同样，以下示例打印c=3textfile.csv的第 3 列（via ）：

#!/bin/bash
gawk -- '
BEGIN{
    RS="csvquote textfile.csv | cut -d ',' -f 3 | csvquote -u
" # Read the whole input file as one record;
    # assume there is no null character in input.
    FS="" # Suppose this setting eases internal splitting work.
    ORS="\n####\n" # Use a special output separator to show borders of a record.
}
{
    nof=patsplit(AirBoxOmega:~ d$ cat > file #First we'll create a basic CSV
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
, a, /([^,"\n]*)|("(("")*[^"]*)*")/, seps)
    field=0;
    for (i=1; i<=nof; i++){
        field++
        if (field==c) {
            if (substr(a[i], 1, 1) == "\"") {
                a[i] = substr(a[i], 2, length(a[i]) - 2) # Get the text within 
                # the two quotes.
                gsub(/""/, "\"", a[i])  # Normalize double quotes.
            }
            print a[i]
        }
        if (seps[i]!=",") field=0
    }
}
' c=3 < <(dos2unix <textfile.csv)

There is another approach to the problem. csvquotecan output contents of a CSV file modified so that special characters within field are transformed so that usual Unix text processing tools can be used to select certain column. For example the following code outputs the third column:

有另一种方法可以解决这个问题。csvquote可以输出修改后的 CSV 文件的内容，以便转换字段中的特殊字符，以便可以使用通常的 Unix 文本处理工具来选择特定列。例如下面的代码输出第三列：

AirBoxOmega:~ d$ while IFS=, read -a csv_line;do echo "${csv_line[0]}";done < file
a
1
a
1
a
1
a
1
a
1
a
1

csvquotecan be used to process arbitrary large files.

csvquote可用于处理任意大文件。

Answer 7

回答by drldcsta

The other answers work well, but since you asked for a solution using just the bash shell, you can do this:

其他答案效果很好，但由于您要求仅使用 bash shell 的解决方案，您可以执行以下操作：

echo "require 'csv'; CSV.read('new.csv').each {|data| puts data[34]}" | ruby

And then you can pull out columns (the first in this example) like so:

然后你可以像这样拉出列（本例中的第一列）：

csvtool col 2 file.csv

So there's a couple of things going on here:

所以这里发生了几件事：

while IFS=,- this is saying to use a comma as the IFS (Internal Field Separator), which is what the shell uses to know what separates fields (blocks of text). So saying IFS=, is like saying "a,b" is the same as "a b" would be if the IFS=" " (which is what it is by default.)
read -a csv_line;- this is saying read in each line, one at a time and create an array where each element is called "csv_line" and send that to the "do" section of our while loop
do echo "${csv_line[0]}";done < file- now we're in the "do" phase, and we're saying echo the 0th element of the array "csv_line". This action is repeated on every line of the file. The < filepart is just telling the while loop where to read from. NOTE: remember, in bash, arrays are 0 indexed, so the first column is the 0th element.

while IFS=,- 这是说使用逗号作为 IFS（内部字段分隔符），这是 shell 用来了解分隔字段（文本块）的内容。所以说 IFS=, 就像说 "a,b" 和 "a b" 是一样的，如果 IFS=" "（这是默认的）。
read -a csv_line;- 这是说在每一行中读取，一次一个并创建一个数组，其中每个元素都称为“csv_line”并将其发送到我们的 while 循环的“do”部分
do echo "${csv_line[0]}";done < file- 现在我们处于“做”阶段，我们说 echo 数组“csv_line”的第 0 个元素。对文件的每一行重复此操作。这< file部分只是告诉 while 循环从哪里读取。注意：请记住，在 bash 中，数组的索引为 0，因此第一列是第 0 个元素。

So there you have it, pulling out a column from a CSV in the shell. The other solutions are probably more practical, but this one is pure bash.

所以你有它，从shell中的CSV中拉出一列。其他解决方案可能更实用，但这个是纯粹的 bash。

Answer 8

回答by Raj Velayudhan

[dumb@one pts]$ cat > file #First we'll create a basic CSV
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10

[dumb@one pts]$ cat > file #首先我们将创建一个基本的CSV
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6, 7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10

[dumb@one pts]$ awk -F , '{print $1}' file
a
1
a
1

Answer 9

回答by Darth Egregious

I needed proper CSV parsing, not cut/ awkand prayer. I'm trying this on a mac without csvtool, but macs do come with ruby, so you can do:

我需要正确的 CSV 解析，而不是cut/awk和祈祷。我正在没有csvtool.

csvtool col 1,2 file.csv

Answer 10

回答by exussum

##代码##

where 2 is the column you are interested in

其中 2 是您感兴趣的列

you can also do

你也可以这样做

##代码##

to do multiple columns

做多列

bash 如何提取csv文件的一列

提问by user788171

回答by synthesizerpatel

回答by madrag

回答by Samar

回答by cevaris

回答by router

回答by jarno

回答by drldcsta

回答by Raj Velayudhan

回答by Darth Egregious

回答by exussum

相关推荐

最近更新

标签

bash 如何提取csv文件的一列

提问by user788171

回答by synthesizerpatel

回答by madrag

回答by Samar

回答by cevaris

回答by router

回答by jarno

回答by drldcsta

回答by Raj Velayudhan

回答by Darth Egregious

回答by exussum

相关推荐

korn 和 bash shell 的区别

bash 显示命令而不执行它们

bash 在bash脚本中向日志文件添加时间戳

在 txt 文件的每一行中打印第一个单词 unix bash

相关推荐

最近更新

标签