bash 使用 Sed 删除部分字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3106809/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Removing Parts of String With Sed
提问by neversaint
I have lines of data that looks like this:
我有如下所示的数据行:
sp_A0A342_ATPB_COFAR_6_+_contigs_full.fasta
sp_A0A342_ATPB_COFAR_9_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_10_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_8_+_contigs_full.fasta
sp_A0A4W3_SPEA_GEOSL_15_-_contigs_full.fasta
How can I use sed
to delete parts of string after 4th column (_ separated) for each line.
Finally yielding:
如何sed
在每行的第 4 列(_ 分隔)之后删除部分字符串。最后产生:
sp_A0A342_ATPB_COFAR
sp_A0A342_ATPB_COFAR
sp_A0A373_RK16_COFAR
sp_A0A373_RK16_COFAR
sp_A0A4W3_SPEA_GEOSL
回答by Matthew Flaschen
cut
is a better fit.
cut
更合适。
cut -d_ -f 1-4 old_file
This simply means use _ as delimiter, and keep fields 1-4.
这只是意味着使用 _ 作为分隔符,并保留字段 1-4。
If you insist on sed
:
如果你坚持sed
:
sed 's/\(_[^_]*\)\{4\}$//'
This left hand side matches exactly four repetitions of a group, consisting of an underscore followed by 0 or more non-underscores. After that, we must be at the end of the line. This is all replaced by nothing.
这个左侧正好匹配一组的四次重复,由一个下划线和 0 个或多个非下划线组成。在那之后,我们必须在行尾。这一切都被什么都取代了。
回答by Scott Thomson
sed -e 's/\([^_]*\)_\([^_]*\)_\([^_]*\)_\([^_]*\)_.*/___' infile > outfile
Match "any number of not '_'", saving what was matched between \( and \), followed by '_'. Do this 4 times, then match anything for the rest of the line (to be ignored). Substitute with each of the matches separated by '_'.
匹配“任意数量的非'_'”,保存\( 和\) 之间匹配的内容,后跟'_'。这样做 4 次,然后匹配该行其余部分的任何内容(被忽略)。用“_”分隔的每个匹配项替换。
回答by Owen S.
Here's another possibility:
这是另一种可能性:
sed -E -e 's|^([^_]+(_[^_]+){3}).*$||'
where -E, like -r in GNU sed, turns on extended regular expressions for readability.
其中 -E 与 GNU sed 中的 -r 一样,打开扩展正则表达式以提高可读性。
Just because you cando it in sed, though, doesn't mean you should. I like cut much much better for this.
但是,仅仅因为您可以在 sed 中做到这一点,并不意味着您应该这样做。我喜欢为此剪得更好。
回答by Paused until further notice.
AWK likes to play in the fields:
AWK喜欢玩的领域:
awk 'BEGIN{FS=OFS="_"}{print ,,,}' inputfile
or, more generally:
或者,更一般地说:
awk -v count=4 'BEGIN{FS="_"}{for(i=1;i<=count;i++){printf "%s%s",sep,$i;sep=FS};printf "\n"}'
回答by Slartibartfast
sed -e 's/_[0-9][0-9]*_[+-]_contigs_full.fasta$//g'
Still the cut answer is probably faster and just generally better.
尽管如此,简单的答案可能更快,而且通常更好。
回答by Peter Ajtai
Yes, cut is way better, and yes matching the back of each is easier.
是的,剪裁更好,是的,每个人的背面都更容易匹配。
I finally got a match using the beginning of each line:
我终于使用每一行的开头匹配了:
sed -r 's/(([^_]*_){3}([^_]*)).*//' oldFile > newFile