bash 使用 Sed 删除部分字符串

Question

提问by neversaint

I have lines of data that looks like this:

我有如下所示的数据行：

sp_A0A342_ATPB_COFAR_6_+_contigs_full.fasta
sp_A0A342_ATPB_COFAR_9_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_10_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_8_+_contigs_full.fasta
sp_A0A4W3_SPEA_GEOSL_15_-_contigs_full.fasta

How can I use sedto delete parts of string after 4th column (_ separated) for each line. Finally yielding:

如何sed在每行的第 4 列（_ 分隔）之后删除部分字符串。最后产生：

sp_A0A342_ATPB_COFAR
sp_A0A342_ATPB_COFAR
sp_A0A373_RK16_COFAR
sp_A0A373_RK16_COFAR
sp_A0A4W3_SPEA_GEOSL

Answer 1

回答by Matthew Flaschen

cutis a better fit.

cut更合适。

cut -d_ -f 1-4 old_file

This simply means use _ as delimiter, and keep fields 1-4.

这只是意味着使用 _ 作为分隔符，并保留字段 1-4。

If you insist on sed:

如果你坚持sed：

sed 's/\(_[^_]*\)\{4\}$//'

This left hand side matches exactly four repetitions of a group, consisting of an underscore followed by 0 or more non-underscores. After that, we must be at the end of the line. This is all replaced by nothing.

这个左侧正好匹配一组的四次重复，由一个下划线和 0 个或多个非下划线组成。在那之后，我们必须在行尾。这一切都被什么都取代了。

Answer 2

回答by Scott Thomson

sed -e 's/\([^_]*\)_\([^_]*\)_\([^_]*\)_\([^_]*\)_.*/___' infile > outfile

Match "any number of not '_'", saving what was matched between \( and \), followed by '_'. Do this 4 times, then match anything for the rest of the line (to be ignored). Substitute with each of the matches separated by '_'.

匹配“任意数量的非'_'”，保存\( 和\) 之间匹配的内容，后跟'_'。这样做 4 次，然后匹配该行其余部分的任何内容（被忽略）。用“_”分隔的每个匹配项替换。

Answer 3

回答by Owen S.

Here's another possibility:

这是另一种可能性：

sed -E -e 's|^([^_]+(_[^_]+){3}).*$||'

where -E, like -r in GNU sed, turns on extended regular expressions for readability.

其中 -E 与 GNU sed 中的 -r 一样，打开扩展正则表达式以提高可读性。

Just because you cando it in sed, though, doesn't mean you should. I like cut much much better for this.

但是，仅仅因为您可以在 sed 中做到这一点，并不意味着您应该这样做。我喜欢为此剪得更好。

Answer 4

回答by Paused until further notice.

AWK likes to play in the fields:

AWK喜欢玩的领域：

awk 'BEGIN{FS=OFS="_"}{print ,,,}' inputfile

or, more generally:

或者，更一般地说：

awk -v count=4 'BEGIN{FS="_"}{for(i=1;i<=count;i++){printf "%s%s",sep,$i;sep=FS};printf "\n"}'

Answer 5

回答by Slartibartfast

sed -e 's/_[0-9][0-9]*_[+-]_contigs_full.fasta$//g'

Still the cut answer is probably faster and just generally better.

尽管如此，简单的答案可能更快，而且通常更好。

Answer 6

回答by Peter Ajtai

Yes, cut is way better, and yes matching the back of each is easier.

是的，剪裁更好，是的，每个人的背面都更容易匹配。

I finally got a match using the beginning of each line:

我终于使用每一行的开头匹配了：

 sed -r 's/(([^_]*_){3}([^_]*)).*//' oldFile > newFile

bash 使用 Sed 删除部分字符串

提问by neversaint

回答by Matthew Flaschen

回答by Scott Thomson

回答by Owen S.

回答by Paused until further notice.

回答by Slartibartfast

回答by Peter Ajtai

相关推荐

最近更新

标签

bash 使用 Sed 删除部分字符串

提问by neversaint

回答by Matthew Flaschen

回答by Scott Thomson

回答by Owen S.

回答by Paused until further notice.

回答by Slartibartfast

回答by Peter Ajtai

相关推荐

bash 将 STDOUT & STDERR 写入日志文件，同时将 STDERR 写入屏幕

在 bash 中检查命令行标志的正确方法

bash 如何将密码传递给 pg_dump？

bash 使用 date 命令比较时间

相关推荐

最近更新

标签