如何在 BASH 中将制表符分隔值 (TSV) 文件转换为逗号分隔值 (CSV) 文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22419979/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 09:55:52  来源:igfitidea点击:

How do I convert a tab-separated values (TSV) file to a comma-separated values (CSV) file in BASH?

bashcsvawktsv

提问by Village

I have some TSV files that I need to convert to CSV files. Is there any solution in BASH, e.g. using awk, to convert these? I could use sed, like this, but am worried it will make some mistakes:

我有一些 TSV 文件需要转换为 CSV 文件。BASH 中是否有任何解决方案,例如使用awk, 来转换这些?我可以sed像这样使用,但我担心它会犯一些错误:

sed 's/\t/,/g' file.tsv > file.csv
  • Quotes needn't be added.
  • 不需要添加引号。

How can I convert a TSV to a CSV?

如何将 TSV 转换为 CSV?

回答by mklement0

Update: The following solutions are not generally robust, although they do work in the OP's specific use case; see the bottom sectionfor a robust, awk-based solution.

更新以下解决方案通常并不健壮,尽管它们确实适用于 OP 的特定用例;请参阅底部以了解强大的awk基于解决方案



To summarize the options (interestingly, they all perform about the same):

总结一下这些选项(有趣的是,它们的表现都差不多):

tr:

时间

devnull's solution (provided in a comment on the question) is the simplest:

devnull的解决方案(在对问题的评论中提供)是最简单的:

tr '\t' ',' < file.tsv > file.csv

sed:

sed

The OP's own sedsolution is perfectly fine, given that the input contains no quoted strings (with potentially embedded \tchars.):

sed鉴于输入不包含带引号的字符串(带有潜在的嵌入\t字符),OP 自己的解决方案非常好:

sed 's/\t/,/g' file.tsv > file.csv

The only caveat is that on some platforms (e.g., macOS) the escape sequence \tis not supported, so a literal tab char. must be spliced into the command string using ANSI quoting ($'\t'):

唯一需要注意的是,在某些平台(例如 macOS)\t上不支持转义序列,因此是文字制表符。必须使用 ANSI 引用 ( $'\t')拼接到命令字符串中:

sed 's/'$'\t''/,/g' file.tsv > file.csv

awk:

awk:

The caveat with awkis that FS- the input field separator - must be set to \texplicitly- the default behavior would otherwise strip leading and trailing tabs and replace interior spans of multiple tabs with only a single ,:

需要注意的awkFS- 输入字段分隔符 - 必须\t显式设置为- 否则默认行为将剥离前导和尾随制表符,并仅用单个 替换多个制表符的内部跨度,

awk 'BEGIN { FS="\t"; OFS="," } {=; print}' file.tsv > file.csv

Note that simply assigning $1to itself causes awkto rebuild the input line using OFS- the outputfield separator; this effectively replaces all \tchars. with ,chars. printthen simply prints the rebuilt line.

请注意,简单地分配$1给自身会导致awk使用OFS-输出字段分隔符重建输入行;这有效地替换了所有\t字符。带,字符。print然后简单地打印重建的行。



Robust awksolution:

强大的awk解决方案

As A. Rabuspoints out, the above solutions do not handle unquoted input fields that themselves contain ,characters correctly - you'll end up with extra CSV fields.

正如A. Rabus指出的那样,上述解决方案无法处理本身包含,正确字符的未加引号的输入字段- 您最终会得到额外的 CSV 字段。

The following awksolution fixes this, by enclosing such fields in "..."on demand (see the non-robust awksolution above for a partial explanation of the approach).

以下awk解决方案解决了这个问题,通过将这些字段包含"..."在按需中(awk有关该方法的部分解释,请参阅上面的非稳健解决方案)。

If such fields also have embedded "chars., these are escaped as "", in line with RFC 4180.Thanks, Wyatt Israel.

如果这些领域也有嵌入的"字符,这些将会被转义成"",符合RFC 4180谢谢,怀亚特以色列

awk 'BEGIN { FS="\t"; OFS="," } {
  rebuilt=0
  for(i=1; i<=NF; ++i) {
    if ($i ~ /,/ && $i !~ /^".*"$/) { 
      gsub("\"", "\"\"", $i)
      $i = "\"" $i "\""
      rebuilt=1 
    }
  }
  if (!rebuilt) { = }
  print
}' file.tsv > file.csv
  • $i ~ /[,"]/ && $i !~ /^".*"$/detects any field that contains ,and/or "and isn't already enclosed in double quotes

  • gsub("\"", "\"\"", $i)escapes embedded "chars. by doubling them

  • $i = "\"" $i "\""updates the result by enclosing it in double quotes

  • As stated before, updating any field causes awkto rebuildthe line from the fields with the OFSvalue, i.e., ,in this case, which amounts to the effective TSV -> CSV conversion; flag rebuiltis used to ensure that each input record is rebuilt at least once.

  • $i ~ /[,"]/ && $i !~ /^".*"$/检测包含,和/或"且尚未用双引号括起来的任何字段

  • gsub("\"", "\"\"", $i)转义嵌入的"字符。通过将它们加倍

  • $i = "\"" $i "\""通过将结果括在双引号中来更新结果

  • 如前所述,更新任何电场使awk重建从领域的线与所述OFS,即,,在这种情况下,这相当于有效的TSV - > CSV转换; 标志rebuilt用于确保每个输入记录至少重建一次

回答by Toby

This can also be achieved with Perl:

这也可以通过 Perl 实现:

In order to pipe the results to a new output file you can use the following:
perl -wnlp -e 's/\t/,/g;' input_file.tsv > output_file.csv

为了将结果通过管道传输到新的输出文件,您可以使用以下命令:
perl -wnlp -e 's/\t/,/g;' input_file.tsv > output_file.csv

If you'd like to edit the file in place, you can invoke the -i option:
perl -wnlpi -e 's/\t/,/g;' input_file.txt

如果您想就地编辑文件,可以调用 -i 选项:
perl -wnlpi -e 's/\t/,/g;' input_file.txt

If by some chance you find that what you are dealing with is not actually tabs, but instead multiple spaces, you can use the following to replace each occurrence of two or more spaces with a comma:
perl -wnlpi -e 's/\s+/,/g;' input_file

如果您偶然发现您处理的实际上不是制表符,而是多个空格,您可以使用以下内容用逗号替换每次出现的两个或多个空格:
perl -wnlpi -e 's/\s+/,/g;' input_file

Keep in mind that \srepresents any whitespace character, including spaces, tabs or newlines and cannot be used in the replacement string.

请记住,它\s代表任何空白字符,包括空格、制表符或换行符,并且不能在替换字符串中使用。

回答by wpmoradi

Using awkworks for me

使用awk对我有用

converting tsv to csv

将 tsv 转换为 csv

awk 'BEGIN { FS="\t"; OFS="," } {=; print}' file.tsv > file.csv

or converting csv to tsv

或将 csv 转换为 tsv

awk 'BEGIN { FS=","; OFS="\t" } {=; print}' file.csv > file.tsv

回答by Pranav

The tr command :

tr 命令:

tr '\t' ',' < file.tsv > file.csv

is simple and gave absolutely correct and very quick results for me even on a really large file (approx 10 GB).

很简单,即使在一个非常大的文件(大约 10 GB)上也为我提供了绝对正确且非常快速的结果。