bash Sed 就地编辑

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6667835/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 20:45:18  来源:igfitidea点击:

Sed in place edit

bashsed

提问by Tathagata

for term in `cat stopwords`; do sed -i 's/\<$term\>//g' spam.txt ;done

Given stopwords contains a word per line and spam.txt is plain text file, I just need to replace exact matches of stopwords. Does not behave as I expect ... Note there are words like doesn't, couldn'tin both the files.

鉴于停用词每行包含一个单词,而 spam.txt 是纯文本文件,我只需要替换停用词的完全匹配项。不像我期望的那样表现......请注意doesn'tcouldn't两个文件中都有像, 之类的词。

采纳答案by Sami Kerola

Are you sure you want to run sed in for loop? I would use sed script-file.

您确定要在 for 循环中运行 sed 吗?我会使用 sed 脚本文件。

TMPFILE=mktemp
for WORD in $(cat stopwords); do echo 's/'$WORD'//g' >> $TMPFILE; done
sed -f $TMPFILE spam.txt
rm -f $TMPFILE

回答by neuro

well you should use " instead of ' in your sed command. Using single quote ' tells the shell to not substitute the $term.

那么你应该在你的 sed 命令中使用 " 而不是 ' 。使用单引号 ' 告诉 shell 不要替换 $term。

This :

这个 :

for term in `cat stopwords`; do sed -i "s/\<$term\>//g" spam.txt ;done

Works for :

效劳于 :

# stopwords
couldn't

and :

和 :

# spam.txt
foo <couldn't> bar

my 2 cents

我的 2 美分

回答by shellter

@kerolasa is onto something there.

@kerolasa 在那里做些什么。

The most important being that your $term is NOT being expanded as a variable. You can rewrite your code as

最重要的是您的 $term 没有被扩展为变量。您可以将代码重写为

for term in `cat stopwords`; do sed -i "s/\<${term}\>//g" spam.txt ;done

But that is a very expensive opperation, you are running sed for each word that is in stopwords. Making a sed script per @kerolasa idea is more efficient, but it depends, if this is a one-off project, then your solution will work.

但这是一个非常昂贵的操作,您正在为 .sed 中的每个单词运行 sed stopwords。根据@kerolasa 的想法制作 sed 脚本更有效,但这取决于,如果这是一次性项目,那么您的解决方案将起作用。

Except ... "words like doesn't, couldn't in both files", Yes, and? I'm not sure what you are saying there, what do you expect/want to happen, why do you think it won't happen? Changing your quoting will help.

除了......“像没有,不能在两个文件中这样的词”,是的,还有?我不确定你在那里说什么,你期望/想要发生什么,你为什么认为它不会发生?改变你的报价会有所帮助。

Finally, note that this solution may break if your stopword list contains spaces, i.e. 'spanner in the works' ;-).

最后,请注意,如果您的停用词列表包含空格,即“工作中的扳手”,则此解决方案可能会中断;-)。

I hope this helps.

我希望这有帮助。

回答by mschilli

Instead of using a tempfile for the script as suggested by Sami Kerola, you could also pipe the script to sed, creating it from stopwordsusing a second instance of sed:

除了Sami Kerola建议的脚本使用临时文件之外,您还可以将脚本通过管道传输到sedstopwords使用以下的第二个实例创建它sed

sed 's,.*,s/\<&\>//g,' stopwords | sed -i -f- spam.txt

Note that I used ,instead of /as separator for the fist instance of sedto not have to quote every /I use as separator in the generated expression. But this is imho just a matter of taste and of course you could also use 's/.*/s\/\\<&\\>\/\/g/'if you like it more.

请注意,我使用,而不是/作为第一个实例的分隔符sedto 不必/在生成的表达式中引用我用作分隔符的每个实例。但恕我直言,这只是一个品味问题,当然,'s/.*/s\/\\<&\\>\/\/g/'如果您更喜欢它,您也可以使用它。