从 csv 文件中删除最后 4 列的 bash 方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14418511/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
bash method to remove last 4 columns from csv file
提问by user788171
Is there a way to use bash to remove the last four columns for some input CSV file? The last four columns can have fields that vary in length from line to line so it is not sufficient to just delete a certain number of characters from the end of each row.
有没有办法使用 bash 删除某些输入 CSV 文件的最后四列?最后四列的字段长度可能因行而异,因此仅从每行的末尾删除一定数量的字符是不够的。
回答by peteches
Cut can do this if all lines have the same number of fields or awk if you don't.
如果所有行都具有相同数量的字段,则 Cut 可以执行此操作,如果没有,则使用 awk。
cut -d, -f1-6 # assuming 10 fields
Will print out the first 6 fields if you want to control the output seperater use --output-delimiter=string
如果要控制输出分隔符,将打印出前 6 个字段,请使用 --output-delimiter=string
awk -F , -v OFS=, '{ for (i=1;i<=NF-4;i++){ printf $i, }; printf "\n"}'
Loops over fields up to th number of fields -4 and prints them out.
循环最多字段数 -4 的字段并将它们打印出来。
回答by Perleone
cat data.csv | rev | cut -d, -f-5 | rev
rev
reverses the lines, so it doesn't matter if all the rows have the same number of columns, it will always remove the last 4. This only works if the last 4 columns don't contain any commas themselves.
rev
反转行,因此如果所有行具有相同的列数并不重要,它将始终删除最后 4 行。这仅适用于最后 4 列本身不包含任何逗号的情况。
回答by JaredC
You can use cut
for this if you know the number of columns. For example, if your file has 9 columns, and comma is your delimiter:
cut
如果您知道列数,则可以用于此目的。例如,如果您的文件有 9 列,并且逗号是您的分隔符:
cut -d',' -f -5
However, this assumes the data in your csv file does not contain any commas. cut
will interpret commas inside of quotes as delimiters also.
但是,这假定 csv 文件中的数据不包含任何逗号。 cut
也会将引号内的逗号解释为分隔符。
回答by YH Wu
awk -F, '{NF-=4; OFS=","; print}' file.csv
or alternatively
或者
awk -F, -vOFS=, '{NF-=4;print}' file.csv
will drop the last 4 columns from each line.
将删除每行的最后 4 列。
回答by kvantour
None of the mentioned methods will work properly when having CVS files with quoted fields with a <comma> character. So it is a bit hard to just use the <comma>-character as a field separator.
当 CVS 文件的引用字段带有 <comma> 字符时,上述方法都无法正常工作。因此,仅使用 <comma> 字符作为字段分隔符有点困难。
The following two posts are now very handy:
以下两个帖子现在非常方便:
- What's the most robust way to efficiently parse CSV using awk?
- [U&L] How to delete the last column of a file in Linux(Note: this is only for GNU awk)
- 使用 awk 有效解析 CSV 的最可靠方法是什么?
- [U&L] 如何在 Linux 中删除文件的最后一列(注:这仅适用于 GNU awk)
Since you work with GNU awk, you can thus do any of the following two:
由于您使用 GNU awk,因此您可以执行以下两项操作中的任何一项:
$ awk -v FPAT='[^,]*|"[^"]+"' -v OFS="," 'NF{NF-=4}1'
Or with any awk, you could do:
或者使用任何 awk,你可以这样做:
$ awk 'BEGIN{ere="([^,]*|2[^2]+2)"
ere=","ere","ere","ere","ere"$"
}
{sub(ere,"")}1'
回答by Mirage
This awk solution in a hacked way
这个 awk 解决方案以一种被黑的方式
awk -F, 'OFS=","{for(i=NF; i>=NF-4; --i) {$i=""}}{gsub(",,,,,","",awk -F, '{for(i=0;++i<=NF-5;)printf $i", ";print $(NF-4)}' file.csv
);print kent$ seq 40|xargs -n10|sed 's/ /, /g'
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
11, 12, 13, 14, 15, 16, 17, 18, 19, 20
21, 22, 23, 24, 25, 26, 27, 28, 29, 30
31, 32, 33, 34, 35, 36, 37, 38, 39, 40
kent$ seq 40|xargs -n10|sed 's/ /, /g' |awk -F, '{for(i=0;++i<=NF-5;)printf $i", ";print $(NF-4)}'
1, 2, 3, 4, 5, 6
11, 12, 13, 14, 15, 16
21, 22, 23, 24, 25, 26
31, 32, 33, 34, 35, 36
}' temp.txt
回答by Kent
awk one-liner:
awk 单行:
sed -r 's/(,[^,]*){4}$//' file
the advantage of using awk over cut is, you don't have to count how many columns do you have, and how many columns you want to keep. Since what you want is removing last 4 columns.
使用 awk 而不是 cut 的优点是,您不必计算您有多少列,以及您想要保留多少列。因为您想要的是删除最后 4 列。
see the test:
看测试:
##代码##回答by potong
This might work for you (GNU sed):
这可能对你有用(GNU sed):
##代码##