Linux unix - cut 命令(添加自己的分隔符)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8630053/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 03:49:09  来源:igfitidea点击:

unix - cut command (adding own delimiter)

linuxbashshellunixscripting

提问by toop

Given a file with data like this (ie stores.dat file)

给定一个包含这样数据的文件(即 stores.dat 文件)

id               storeNo     type
2ttfgdhdfgh      1gfdkl-28   kgdl
9dhfdhfdfh       2t-33gdm    dgjkfndkgf

Desired output:

期望的输出:

id               |storeNo     |type
2ttfgdhdfgh      |1gfdkl-28   |kgdl
9dhfdhfdfh       |2t-33gdm    |dgjkfndkgf

Would like to add a "|" delimiter between each of these 3 cut ranges:

想加一个“|” 这 3 个剪切范围中的每一个之间的分隔符:

cut -c1-18,19-30,31-40 stores.dat

What is the syntax to insert a delimiter between each cut?

在每个剪切之间插入分隔符的语法是什么?

BONUS pts (if you can provide the option to trim the values like so):

BONUS pts(如果您可以提供像这样修剪值的选项):

id|storeNo|type
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf\

UPDATE (thanks to Mat's answer) I ended up with success on this solution - (it is a bit messy but SunOS with my bash version doesn't seem to support more elegant arithmetic)

更新(感谢 Mat 的回答)我最终在这个解决方案上取得了成功 - (这有点混乱,但我的 bash 版本的 SunOS 似乎不支持更优雅的算术)

#!/bin/bash
unpack=""
filename=""
while [ $# -gt 0 ] ; do
    arg=""
    if [ "$arg" != "$filename" ]
    then
        firstcharpos=`echo $arg | awk -F"-" '{print }'`
        secondcharpos=`echo $arg | awk -F"-" '{print }'`
        compute=`(expr $firstcharpos - $secondcharpos)`
        compute=`(expr $compute \* -1 + 1)`
        unpack=$unpack"A"$compute
    fi
    shift
done
perl -ne 'print join("|",unpack("'$unpack'", $_)), "\n";' $filename 

Usage: sh test.sh input_file 1-17 18-29 30-39

用法:sh test.sh input_file 1-17 18-29 30-39

采纳答案by Mat

If you're not afraid of using perl, here's a one-liner:

如果你不害怕使用 perl,这里有一个单行:

$ perl -ne 'print join("|",unpack("A17A12A10", $_)), "\n";' input 

The unpackcall will extract one 17 char string, then a 12 char one, then a 10 char one from the input line, and return them in an array (stripping spaces). joinadds the |s.

unpack调用将从输入行中提取一个 17 个字符的字符串,然后是一个 12 个字符的字符串,然后是一个 10 个字符的字符串,并将它们返回到一个数组中(去除空格)。join添加|s。

If you want the input columns to be in x-yformat, without writing a "real" script, you could hack it like this (but it's ugly):

如果您希望输入列采用x-y格式,而无需编写“真实”脚本,您可以像这样破解它(但它很难看):

#!/bin/bash
unpack=""

while [ $# -gt 1 ] ; do
    arg=$(())
    shift
    unpack=$unpack"A"$((-1*$arg+1))
done

perl -ne 'print join("|",unpack("'$unpack'", $_)), "\n";'  

Usage: t.sh 1-17 18-29 30-39 input_file.

用法:t.sh 1-17 18-29 30-39 input_file

回答by Shraddha

use 'sed' to search and replace parts of a file based on regular expressions

使用 'sed' 根据正则表达式搜索和替换文件的某些部分

Replace whitespace with '|' from infile1

用“|”替换空格 来自 infile1

sed -e 's/[ \t\r]/|/g' infile1 > outfile3

回答by zwol

You can't do that with cutas far as I am aware, but you can do it easily with sedas long as the values in each column never have internalspaces:

你不能做到这一点与cut据我所知,但你可以很容易地做到这一点sed,只要在每列中的值永远不会有内部空间:

sed -e 's/  */|/g'

EDIT: If the file format is a true fixed-column format, and you don't want to use perlas shown by Mat, this canbe done with sedbut it's not pretty, because seddoesn't support numeric repetition quantifiers (.{17}), so you have to type out the right number of dots:

编辑:如果文件格式是真正的固定列格式,并且您不想使用perlMat 所示的格式,可以使用sed它来完成,但它并不漂亮,因为sed不支持数字重复量词 ( .{17}),因此您必须输入正确的点数:

sed -e 's/^\(.................\)\(............\)\(..........\)$/||/; s/  *|/|/g'

回答by ugoren

I'd use awk:

我会使用 awk:

awk '{print  "|"  "|" }'

Like some of the other suggestions, it assumes columns are whitespace separated, and doesn't care about the column numbers. If you have spaces in one of the fields, it won't work.

与其他一些建议一样,它假定列以空格分隔,并且不关心列号。如果您在其中一个字段中有空格,它将不起作用。

回答by jaypal singh

How about using just trcommand.

只使用tr命令怎么样。

tr -s " " "|" < stores.dat

From the manpage:

man页面:

-s      Squeeze multiple occurrences of the characters listed in the last
        operand (either string1 or string2) in the input into a single
        instance of the character.  This occurs after all deletion and
        translation is completed.

Test:

测试:

[jaypal:~/Temp] cat stores.dat 
id               storeNo     type
2ttfgdhdfgh      1gfdkl-28   kgdl
9dhfdhfdfh       2t-33gdm    dgjkfndkgf

[jaypal:~/Temp] tr -s " " "|" < stores.dat 
id|storeNo|type
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf

You can easily redirect this to a new file like this -

您可以轻松地将其重定向到这样的新文件 -

[jaypal:~/Temp] tr -s " " "|" < stores.dat > new.stores.dat

Note: As Mat pointed out in the comments, this solution assumes each column is separated by one or more white-space and not separated by a fixed length.

注意:正如 Mat 在评论中指出的那样,此解决方案假定每一列由一个或多个空格分隔,而不是由固定长度分隔。

回答by Fredrik Pihl

Since you used cutin your example. Assuming each field is separated with a tab:

因为你cut在你的例子中使用过。假设每个字段都用制表符分隔:

$ cut  --output-delimiter='|' -f1-3 input
id|store|No
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf

if that is not the case, add the input-separator switch -d

如果不是这种情况,请添加输入分隔符开关 -d

回答by roblogic

Better awk solution based on character position, not whitespace

基于字符位置而不是空格的更好的 awk 解决方案

$ awk -v FIELDWIDTHS='17 12 10' -v OFS='|' '{ = ""; print }' stores.dat | tr -d ' '

id|storeNo|type
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf