bash Sed 就地编辑

Question

提问by Tathagata

for term in `cat stopwords`; do sed -i 's/\<$term\>//g' spam.txt ;done

Given stopwords contains a word per line and spam.txt is plain text file, I just need to replace exact matches of stopwords. Does not behave as I expect ... Note there are words like doesn't, couldn'tin both the files.

鉴于停用词每行包含一个单词，而 spam.txt 是纯文本文件，我只需要替换停用词的完全匹配项。不像我期望的那样表现......请注意doesn't，couldn't两个文件中都有像, 之类的词。

Answer 1

采纳答案by Sami Kerola

Are you sure you want to run sed in for loop? I would use sed script-file.

您确定要在 for 循环中运行 sed 吗？我会使用 sed 脚本文件。

TMPFILE=mktemp
for WORD in $(cat stopwords); do echo 's/'$WORD'//g' >> $TMPFILE; done
sed -f $TMPFILE spam.txt
rm -f $TMPFILE

Answer 2

回答by neuro

well you should use " instead of ' in your sed command. Using single quote ' tells the shell to not substitute the $term.

那么你应该在你的 sed 命令中使用 " 而不是 ' 。使用单引号 ' 告诉 shell 不要替换 $term。

This :

这个：

for term in `cat stopwords`; do sed -i "s/\<$term\>//g" spam.txt ;done

Works for :

效劳于：

# stopwords
couldn't

and :

和：

# spam.txt
foo <couldn't> bar

my 2 cents

我的 2 美分

Answer 3

回答by shellter

@kerolasa is onto something there.

@kerolasa 在那里做些什么。

The most important being that your $term is NOT being expanded as a variable. You can rewrite your code as

最重要的是您的 $term 没有被扩展为变量。您可以将代码重写为

for term in `cat stopwords`; do sed -i "s/\<${term}\>//g" spam.txt ;done

But that is a very expensive opperation, you are running sed for each word that is in stopwords. Making a sed script per @kerolasa idea is more efficient, but it depends, if this is a one-off project, then your solution will work.

但这是一个非常昂贵的操作，您正在为 .sed 中的每个单词运行 sed stopwords。根据@kerolasa 的想法制作 sed 脚本更有效，但这取决于，如果这是一次性项目，那么您的解决方案将起作用。

Except ... "words like doesn't, couldn't in both files", Yes, and? I'm not sure what you are saying there, what do you expect/want to happen, why do you think it won't happen? Changing your quoting will help.

除了......“像没有，不能在两个文件中这样的词”，是的，还有？我不确定你在那里说什么，你期望/想要发生什么，你为什么认为它不会发生？改变你的报价会有所帮助。

Finally, note that this solution may break if your stopword list contains spaces, i.e. 'spanner in the works' ;-).

最后，请注意，如果您的停用词列表包含空格，即“工作中的扳手”，则此解决方案可能会中断；-)。

I hope this helps.

我希望这有帮助。

Answer 4

回答by mschilli

Instead of using a tempfile for the script as suggested by Sami Kerola, you could also pipe the script to sed, creating it from stopwordsusing a second instance of sed:

除了Sami Kerola建议的脚本使用临时文件之外，您还可以将脚本通过管道传输到sed，stopwords使用以下的第二个实例创建它sed：

sed 's,.*,s/\<&\>//g,' stopwords | sed -i -f- spam.txt

Note that I used ,instead of /as separator for the fist instance of sedto not have to quote every /I use as separator in the generated expression. But this is imho just a matter of taste and of course you could also use 's/.*/s\/\\<&\\>\/\/g/'if you like it more.

请注意，我使用,而不是/作为第一个实例的分隔符sedto 不必/在生成的表达式中引用我用作分隔符的每个实例。但恕我直言，这只是一个品味问题，当然，'s/.*/s\/\\<&\\>\/\/g/'如果您更喜欢它，您也可以使用它。

bash Sed 就地编辑

提问by Tathagata

采纳答案by Sami Kerola

回答by neuro

回答by shellter

回答by mschilli

相关推荐

最近更新

标签

bash Sed 就地编辑

提问by Tathagata

采纳答案by Sami Kerola

回答by neuro

回答by shellter

回答by mschilli

相关推荐

Bash 语法错误：文件意外结束

BASH 中的“CLS”等价物？

并行运行 bash 命令，跟踪结果和计数

bash 如何在命令行中输入制表符？

相关推荐

最近更新

标签