Linux unix - cut 命令(添加自己的分隔符)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8630053/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
unix - cut command (adding own delimiter)
提问by toop
Given a file with data like this (ie stores.dat file)
给定一个包含这样数据的文件(即 stores.dat 文件)
id storeNo type
2ttfgdhdfgh 1gfdkl-28 kgdl
9dhfdhfdfh 2t-33gdm dgjkfndkgf
Desired output:
期望的输出:
id |storeNo |type
2ttfgdhdfgh |1gfdkl-28 |kgdl
9dhfdhfdfh |2t-33gdm |dgjkfndkgf
Would like to add a "|" delimiter between each of these 3 cut ranges:
想加一个“|” 这 3 个剪切范围中的每一个之间的分隔符:
cut -c1-18,19-30,31-40 stores.dat
What is the syntax to insert a delimiter between each cut?
在每个剪切之间插入分隔符的语法是什么?
BONUS pts (if you can provide the option to trim the values like so):
BONUS pts(如果您可以提供像这样修剪值的选项):
id|storeNo|type
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf\
UPDATE (thanks to Mat's answer) I ended up with success on this solution - (it is a bit messy but SunOS with my bash version doesn't seem to support more elegant arithmetic)
更新(感谢 Mat 的回答)我最终在这个解决方案上取得了成功 - (这有点混乱,但我的 bash 版本的 SunOS 似乎不支持更优雅的算术)
#!/bin/bash
unpack=""
filename=""
while [ $# -gt 0 ] ; do
arg=""
if [ "$arg" != "$filename" ]
then
firstcharpos=`echo $arg | awk -F"-" '{print }'`
secondcharpos=`echo $arg | awk -F"-" '{print }'`
compute=`(expr $firstcharpos - $secondcharpos)`
compute=`(expr $compute \* -1 + 1)`
unpack=$unpack"A"$compute
fi
shift
done
perl -ne 'print join("|",unpack("'$unpack'", $_)), "\n";' $filename
Usage: sh test.sh input_file 1-17 18-29 30-39
用法:sh test.sh input_file 1-17 18-29 30-39
采纳答案by Mat
If you're not afraid of using perl, here's a one-liner:
如果你不害怕使用 perl,这里有一个单行:
$ perl -ne 'print join("|",unpack("A17A12A10", $_)), "\n";' input
The unpack
call will extract one 17 char string, then a 12 char one, then a 10 char one from the input line, and return them in an array (stripping spaces). join
adds the |
s.
该unpack
调用将从输入行中提取一个 17 个字符的字符串,然后是一个 12 个字符的字符串,然后是一个 10 个字符的字符串,并将它们返回到一个数组中(去除空格)。join
添加|
s。
If you want the input columns to be in x-y
format, without writing a "real" script, you could hack it like this (but it's ugly):
如果您希望输入列采用x-y
格式,而无需编写“真实”脚本,您可以像这样破解它(但它很难看):
#!/bin/bash
unpack=""
while [ $# -gt 1 ] ; do
arg=$(())
shift
unpack=$unpack"A"$((-1*$arg+1))
done
perl -ne 'print join("|",unpack("'$unpack'", $_)), "\n";'
Usage: t.sh 1-17 18-29 30-39 input_file
.
用法:t.sh 1-17 18-29 30-39 input_file
。
回答by Shraddha
use 'sed' to search and replace parts of a file based on regular expressions
使用 'sed' 根据正则表达式搜索和替换文件的某些部分
Replace whitespace with '|' from infile1
用“|”替换空格 来自 infile1
sed -e 's/[ \t\r]/|/g' infile1 > outfile3
回答by zwol
You can't do that with cut
as far as I am aware, but you can do it easily with sed
as long as the values in each column never have internalspaces:
你不能做到这一点与cut
据我所知,但你可以很容易地做到这一点sed
,只要在每列中的值永远不会有内部空间:
sed -e 's/ */|/g'
EDIT: If the file format is a true fixed-column format, and you don't want to use perl
as shown by Mat, this canbe done with sed
but it's not pretty, because sed
doesn't support numeric repetition quantifiers (.{17}
), so you have to type out the right number of dots:
编辑:如果文件格式是真正的固定列格式,并且您不想使用perl
Mat 所示的格式,则可以使用sed
它来完成,但它并不漂亮,因为sed
不支持数字重复量词 ( .{17}
),因此您必须输入正确的点数:
sed -e 's/^\(.................\)\(............\)\(..........\)$/||/; s/ *|/|/g'
回答by ugoren
I'd use awk:
我会使用 awk:
awk '{print "|" "|" }'
Like some of the other suggestions, it assumes columns are whitespace separated, and doesn't care about the column numbers. If you have spaces in one of the fields, it won't work.
与其他一些建议一样,它假定列以空格分隔,并且不关心列号。如果您在其中一个字段中有空格,它将不起作用。
回答by jaypal singh
How about using just tr
command.
只使用tr
命令怎么样。
tr -s " " "|" < stores.dat
From the man
page:
从man
页面:
-s Squeeze multiple occurrences of the characters listed in the last
operand (either string1 or string2) in the input into a single
instance of the character. This occurs after all deletion and
translation is completed.
Test:
测试:
[jaypal:~/Temp] cat stores.dat
id storeNo type
2ttfgdhdfgh 1gfdkl-28 kgdl
9dhfdhfdfh 2t-33gdm dgjkfndkgf
[jaypal:~/Temp] tr -s " " "|" < stores.dat
id|storeNo|type
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf
You can easily redirect this to a new file like this -
您可以轻松地将其重定向到这样的新文件 -
[jaypal:~/Temp] tr -s " " "|" < stores.dat > new.stores.dat
Note: As Mat pointed out in the comments, this solution assumes each column is separated by one or more white-space and not separated by a fixed length.
注意:正如 Mat 在评论中指出的那样,此解决方案假定每一列由一个或多个空格分隔,而不是由固定长度分隔。
回答by Fredrik Pihl
Since you used cut
in your example.
Assuming each field is separated with a tab:
因为你cut
在你的例子中使用过。假设每个字段都用制表符分隔:
$ cut --output-delimiter='|' -f1-3 input
id|store|No
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf
if that is not the case, add the input-separator switch -d
如果不是这种情况,请添加输入分隔符开关 -d
回答by roblogic
Better awk solution based on character position, not whitespace
基于字符位置而不是空格的更好的 awk 解决方案
$ awk -v FIELDWIDTHS='17 12 10' -v OFS='|' '{ = ""; print }' stores.dat | tr -d ' '
id|storeNo|type
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf