bash Sed 就地编辑
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6667835/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sed in place edit
提问by Tathagata
for term in `cat stopwords`; do sed -i 's/\<$term\>//g' spam.txt ;done
Given stopwords contains a word per line and spam.txt is plain text file, I just need to replace exact matches of stopwords. Does not behave as I expect ...
Note there are words like doesn't
, couldn't
in both the files.
鉴于停用词每行包含一个单词,而 spam.txt 是纯文本文件,我只需要替换停用词的完全匹配项。不像我期望的那样表现......请注意doesn't
,couldn't
两个文件中都有像, 之类的词。
采纳答案by Sami Kerola
Are you sure you want to run sed in for loop? I would use sed script-file.
您确定要在 for 循环中运行 sed 吗?我会使用 sed 脚本文件。
TMPFILE=mktemp
for WORD in $(cat stopwords); do echo 's/'$WORD'//g' >> $TMPFILE; done
sed -f $TMPFILE spam.txt
rm -f $TMPFILE
回答by neuro
well you should use " instead of ' in your sed command. Using single quote ' tells the shell to not substitute the $term.
那么你应该在你的 sed 命令中使用 " 而不是 ' 。使用单引号 ' 告诉 shell 不要替换 $term。
This :
这个 :
for term in `cat stopwords`; do sed -i "s/\<$term\>//g" spam.txt ;done
Works for :
效劳于 :
# stopwords
couldn't
and :
和 :
# spam.txt
foo <couldn't> bar
my 2 cents
我的 2 美分
回答by shellter
@kerolasa is onto something there.
@kerolasa 在那里做些什么。
The most important being that your $term is NOT being expanded as a variable. You can rewrite your code as
最重要的是您的 $term 没有被扩展为变量。您可以将代码重写为
for term in `cat stopwords`; do sed -i "s/\<${term}\>//g" spam.txt ;done
But that is a very expensive opperation, you are running sed for each word that is in stopwords
. Making a sed script per @kerolasa idea is more efficient, but it depends, if this is a one-off project, then your solution will work.
但这是一个非常昂贵的操作,您正在为 .sed 中的每个单词运行 sed stopwords
。根据@kerolasa 的想法制作 sed 脚本更有效,但这取决于,如果这是一次性项目,那么您的解决方案将起作用。
Except ... "words like doesn't, couldn't in both files", Yes, and? I'm not sure what you are saying there, what do you expect/want to happen, why do you think it won't happen? Changing your quoting will help.
除了......“像没有,不能在两个文件中这样的词”,是的,还有?我不确定你在那里说什么,你期望/想要发生什么,你为什么认为它不会发生?改变你的报价会有所帮助。
Finally, note that this solution may break if your stopword list contains spaces, i.e. 'spanner in the works' ;-).
最后,请注意,如果您的停用词列表包含空格,即“工作中的扳手”,则此解决方案可能会中断;-)。
I hope this helps.
我希望这有帮助。
回答by mschilli
Instead of using a tempfile for the script as suggested by Sami Kerola, you could also pipe the script to sed
, creating it from stopwords
using a second instance of sed
:
除了Sami Kerola建议的脚本使用临时文件之外,您还可以将脚本通过管道传输到sed
,stopwords
使用以下的第二个实例创建它sed
:
sed 's,.*,s/\<&\>//g,' stopwords | sed -i -f- spam.txt
Note that I used ,
instead of /
as separator for the fist instance of sed
to not have to quote every /
I use as separator in the generated expression. But this is imho just a matter of taste and of course you could also use 's/.*/s\/\\<&\\>\/\/g/'
if you like it more.
请注意,我使用,
而不是/
作为第一个实例的分隔符sed
to 不必/
在生成的表达式中引用我用作分隔符的每个实例。但恕我直言,这只是一个品味问题,当然,'s/.*/s\/\\<&\\>\/\/g/'
如果您更喜欢它,您也可以使用它。