如何在 BASH 中将制表符分隔值 (TSV) 文件转换为逗号分隔值 (CSV) 文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22419979/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I convert a tab-separated values (TSV) file to a comma-separated values (CSV) file in BASH?
提问by Village
I have some TSV files that I need to convert to CSV files. Is there any solution in BASH, e.g. using awk
, to convert these? I could use sed
, like this, but am worried it will make some mistakes:
我有一些 TSV 文件需要转换为 CSV 文件。BASH 中是否有任何解决方案,例如使用awk
, 来转换这些?我可以sed
像这样使用,但我担心它会犯一些错误:
sed 's/\t/,/g' file.tsv > file.csv
- Quotes needn't be added.
- 不需要添加引号。
How can I convert a TSV to a CSV?
如何将 TSV 转换为 CSV?
回答by mklement0
Update: The following solutions are not generally robust, although they do work in the OP's specific use case; see the bottom sectionfor a robust, awk
-based solution.
更新:以下解决方案通常并不健壮,尽管它们确实适用于 OP 的特定用例;请参阅底部以了解强大的awk
基于解决方案。
To summarize the options (interestingly, they all perform about the same):
总结一下这些选项(有趣的是,它们的表现都差不多):
tr:
时间:
devnull's solution (provided in a comment on the question) is the simplest:
devnull的解决方案(在对问题的评论中提供)是最简单的:
tr '\t' ',' < file.tsv > file.csv
sed:
sed:
The OP's own sed
solution is perfectly fine, given that the input contains no quoted strings (with potentially embedded \t
chars.):
sed
鉴于输入不包含带引号的字符串(带有潜在的嵌入\t
字符),OP 自己的解决方案非常好:
sed 's/\t/,/g' file.tsv > file.csv
The only caveat is that on some platforms (e.g., macOS) the escape sequence \t
is not supported, so a literal tab char. must be spliced into the command string using ANSI quoting ($'\t'
):
唯一需要注意的是,在某些平台(例如 macOS)\t
上不支持转义序列,因此是文字制表符。必须使用 ANSI 引用 ( $'\t'
)拼接到命令字符串中:
sed 's/'$'\t''/,/g' file.tsv > file.csv
awk:
awk:
The caveat with awk
is that FS
- the input field separator - must be set to \t
explicitly- the default behavior would otherwise strip leading and trailing tabs and replace interior spans of multiple tabs with only a single ,
:
需要注意的awk
是FS
- 输入字段分隔符 - 必须\t
显式设置为- 否则默认行为将剥离前导和尾随制表符,并仅用单个 替换多个制表符的内部跨度,
:
awk 'BEGIN { FS="\t"; OFS="," } {=; print}' file.tsv > file.csv
Note that simply assigning $1
to itself causes awk
to rebuild the input line using OFS
- the outputfield separator; this effectively replaces all \t
chars. with ,
chars. print
then simply prints the rebuilt line.
请注意,简单地分配$1
给自身会导致awk
使用OFS
-输出字段分隔符重建输入行;这有效地替换了所有\t
字符。带,
字符。print
然后简单地打印重建的行。
Robust awk
solution:
强大的awk
解决方案:
As A. Rabuspoints out, the above solutions do not handle unquoted input fields that themselves contain ,
characters correctly - you'll end up with extra CSV fields.
正如A. Rabus指出的那样,上述解决方案无法处理本身包含,
正确字符的未加引号的输入字段- 您最终会得到额外的 CSV 字段。
The following awk
solution fixes this, by enclosing such fields in "..."
on demand (see the non-robust awk
solution above for a partial explanation of the approach).
以下awk
解决方案解决了这个问题,通过将这些字段包含"..."
在按需中(awk
有关该方法的部分解释,请参阅上面的非稳健解决方案)。
If such fields also have embedded "
chars., these are escaped as ""
, in line with RFC 4180.Thanks, Wyatt Israel.
如果这些领域也有嵌入的"
字符,这些将会被转义成""
,符合RFC 4180。谢谢,怀亚特以色列。
awk 'BEGIN { FS="\t"; OFS="," } {
rebuilt=0
for(i=1; i<=NF; ++i) {
if ($i ~ /,/ && $i !~ /^".*"$/) {
gsub("\"", "\"\"", $i)
$i = "\"" $i "\""
rebuilt=1
}
}
if (!rebuilt) { = }
print
}' file.tsv > file.csv
$i ~ /[,"]/ && $i !~ /^".*"$/
detects any field that contains,
and/or"
and isn't already enclosed in double quotesgsub("\"", "\"\"", $i)
escapes embedded"
chars. by doubling them$i = "\"" $i "\""
updates the result by enclosing it in double quotesAs stated before, updating any field causes
awk
to rebuildthe line from the fields with theOFS
value, i.e.,,
in this case, which amounts to the effective TSV -> CSV conversion; flagrebuilt
is used to ensure that each input record is rebuilt at least once.
$i ~ /[,"]/ && $i !~ /^".*"$/
检测包含,
和/或"
且尚未用双引号括起来的任何字段gsub("\"", "\"\"", $i)
转义嵌入的"
字符。通过将它们加倍$i = "\"" $i "\""
通过将结果括在双引号中来更新结果如前所述,更新任何电场使
awk
以重建从领域的线与所述OFS
值,即,,
在这种情况下,这相当于有效的TSV - > CSV转换; 标志rebuilt
用于确保每个输入记录至少重建一次。
回答by Toby
This can also be achieved with Perl:
这也可以通过 Perl 实现:
In order to pipe the results to a new output file you can use the following:perl -wnlp -e 's/\t/,/g;' input_file.tsv > output_file.csv
为了将结果通过管道传输到新的输出文件,您可以使用以下命令:perl -wnlp -e 's/\t/,/g;' input_file.tsv > output_file.csv
If you'd like to edit the file in place, you can invoke the -i option:perl -wnlpi -e 's/\t/,/g;' input_file.txt
如果您想就地编辑文件,可以调用 -i 选项:perl -wnlpi -e 's/\t/,/g;' input_file.txt
If by some chance you find that what you are dealing with is not actually tabs, but instead multiple spaces, you can use the following to replace each occurrence of two or more spaces with a comma:perl -wnlpi -e 's/\s+/,/g;' input_file
如果您偶然发现您处理的实际上不是制表符,而是多个空格,您可以使用以下内容用逗号替换每次出现的两个或多个空格:perl -wnlpi -e 's/\s+/,/g;' input_file
Keep in mind that \s
represents any whitespace character, including spaces, tabs or newlines and cannot be used in the replacement string.
请记住,它\s
代表任何空白字符,包括空格、制表符或换行符,并且不能在替换字符串中使用。
回答by wpmoradi
Using awkworks for me
使用awk对我有用
converting tsv to csv
将 tsv 转换为 csv
awk 'BEGIN { FS="\t"; OFS="," } {=; print}' file.tsv > file.csv
or converting csv to tsv
或将 csv 转换为 tsv
awk 'BEGIN { FS=","; OFS="\t" } {=; print}' file.csv > file.tsv
回答by Pranav
The tr command :
tr 命令:
tr '\t' ',' < file.tsv > file.csv
is simple and gave absolutely correct and very quick results for me even on a really large file (approx 10 GB).
很简单,即使在一个非常大的文件(大约 10 GB)上也为我提供了绝对正确且非常快速的结果。