从 csv 文件中删除最后 4 列的 bash 方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14418511/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 23:14:33  来源:igfitidea点击:

bash method to remove last 4 columns from csv file

bashcsvsedawkcut

提问by user788171

Is there a way to use bash to remove the last four columns for some input CSV file? The last four columns can have fields that vary in length from line to line so it is not sufficient to just delete a certain number of characters from the end of each row.

有没有办法使用 bash 删除某些输入 CSV 文件的最后四列?最后四列的字段长度可能因行而异,因此仅从每行的末尾删除一定数量的字符是不够的。

回答by peteches

Cut can do this if all lines have the same number of fields or awk if you don't.

如果所有行都具有相同数量的字段,则 Cut 可以执行此操作,如果没有,则使用 awk。

cut -d, -f1-6 # assuming 10 fields

Will print out the first 6 fields if you want to control the output seperater use --output-delimiter=string

如果要控制输出分隔符,将打印出前 6 个字段,请使用 --output-delimiter=string

awk -F , -v OFS=, '{ for (i=1;i<=NF-4;i++){ printf $i, }; printf "\n"}'

Loops over fields up to th number of fields -4 and prints them out.

循环最多字段数 -4 的字段并将它们打印出来。

回答by Perleone

cat data.csv | rev | cut -d, -f-5 | rev

revreverses the lines, so it doesn't matter if all the rows have the same number of columns, it will always remove the last 4. This only works if the last 4 columns don't contain any commas themselves.

rev反转行,因此如果所有行具有相同的列数并不重要,它将始终删除最后 4 行。这仅适用于最后 4 列本身不包含任何逗号的情况。

回答by JaredC

You can use cutfor this if you know the number of columns. For example, if your file has 9 columns, and comma is your delimiter:

cut如果您知道列数,则可以用于此目的。例如,如果您的文件有 9 列,并且逗号是您的分隔符:

cut -d',' -f -5

However, this assumes the data in your csv file does not contain any commas. cutwill interpret commas inside of quotes as delimiters also.

但是,这假定 csv 文件中的数据不包含任何逗号。 cut也会将引号内的逗号解释为分隔符。

回答by YH Wu

awk -F, '{NF-=4; OFS=","; print}' file.csv

or alternatively

或者

awk -F, -vOFS=, '{NF-=4;print}' file.csv

will drop the last 4 columns from each line.

将删除每行的最后 4 列。

回答by kvantour

None of the mentioned methods will work properly when having CVS files with quoted fields with a <comma> character. So it is a bit hard to just use the <comma>-character as a field separator.

当 CVS 文件的引用字段带有 <comma> 字符时,上述方法都无法正常工作。因此,仅使用 <comma> 字符作为字段分隔符有点困难。

The following two posts are now very handy:

以下两个帖子现在非常方便:

Since you work with GNU awk, you can thus do any of the following two:

由于您使用 GNU awk,因此您可以执行以下两项操作中的任何一项:

$ awk -v FPAT='[^,]*|"[^"]+"' -v OFS="," 'NF{NF-=4}1'

Or with any awk, you could do:

或者使用任何 awk,你可以这样做:

$ awk 'BEGIN{ere="([^,]*|2[^2]+2)"
             ere=","ere","ere","ere","ere"$"
       }
       {sub(ere,"")}1'

回答by Mirage

This awk solution in a hacked way

这个 awk 解决方案以一种被黑的方式

awk -F, 'OFS=","{for(i=NF; i>=NF-4; --i) {$i=""}}{gsub(",,,,,","",
awk -F, '{for(i=0;++i<=NF-5;)printf $i", ";print $(NF-4)}'  file.csv
);print
kent$  seq 40|xargs -n10|sed 's/ /, /g'           
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
11, 12, 13, 14, 15, 16, 17, 18, 19, 20
21, 22, 23, 24, 25, 26, 27, 28, 29, 30
31, 32, 33, 34, 35, 36, 37, 38, 39, 40

kent$  seq 40|xargs -n10|sed 's/ /, /g' |awk -F, '{for(i=0;++i<=NF-5;)printf $i", ";print $(NF-4)}'
1,  2,  3,  4,  5,  6
11,  12,  13,  14,  15,  16
21,  22,  23,  24,  25,  26
31,  32,  33,  34,  35,  36
}' temp.txt

回答by Kent

awk one-liner:

awk 单行:

sed -r 's/(,[^,]*){4}$//' file

the advantage of using awk over cut is, you don't have to count how many columns do you have, and how many columns you want to keep. Since what you want is removing last 4 columns.

使用 awk 而不是 cut 的优点是,您不必计算您有多少列,以及您想要保留多少列。因为您想要的是删除最后 4 列。

see the test:

看测试:

##代码##

回答by potong

This might work for you (GNU sed):

这可能对你有用(GNU sed):

##代码##