bash sed 删除除字母和 ' 之外的所有字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40588240/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
sed to remove all characters except letters and '
提问by Jakob
I am using this sed command to strip documents of all their (for me) unnecessary characters.
我正在使用这个 sed 命令来去除所有(对我来说)不必要的字符的文档。
sed 's/[^a-zA-Z]/ /g'
However after mining my data a bit I realized a pretty basic mistake:
not including '
cuts all my don't
s into don t
s, which sucks.
然而,在稍微挖掘我的数据之后,我意识到一个非常基本的错误:不包括'
将我所有的don't
s 切割成don t
s,这很糟糕。
So i want to include '
in my regex. I'm still new to this kind of "coding" if I may call it that way, so excuse my newbie mistake or even better, explain it to me!
所以我想包含'
在我的正则表达式中。如果我可以这样称呼它,我对这种“编码”仍然是新手,所以请原谅我的新手错误,甚至更好,向我解释一下!
sed 's/[^a-zA-Z']/ /g'
this obviously doesn't work
sed 's/[^a-zA-Z']/ /g'
这显然不起作用
sed 's/[^a-zA-Z\']/ /g'
however this doesn't either, I thought \
escapes the '
?
sed 's/[^a-zA-Z\']/ /g'
然而这也不是,我认为\
逃脱了'
?
回答by Jean-Fran?ois Fabre
Good old double-quotes in action to protect the single quote without any need of escaping:
很好的旧双引号可以保护单引号而无需转义:
sed "s/[^a-zA-Z']/ /g" <<< "don't ... do this"
gives:
给出:
don't do this
EDIT: your code seems to replace non-letters by space, but your question states otherwise, so I'm giving you the other version, to remove all non-letters/spaces and multiple occurrences of spaces as well (2nd expression).
编辑:你的代码似乎用空格替换了非字母,但你的问题另有说明,所以我给你另一个版本,删除所有非字母/空格和多次出现的空格(第二个表达式)。
sed -e "s/[^ a-zA-Z']//g" -e 's/ \+/ /' <<< "don't ... do this"
result:
结果:
don't do this
EDIT2: alternate solution to be able to keep single quotes (courtesy of Sundeep):
EDIT2:能够保留单引号的替代解决方案(由 Sundeep 提供):
`'s/[^ a-zA-Z\x27]//g'`
Note: I first tried to escape single quotes following the solutions tested hereand none using single quotes worked for me (always prompting for a line continuation) so I came up with those alternatives.
注意:我首先尝试按照此处测试的解决方案转义单引号,但没有使用单引号对我有用(总是提示行继续),所以我想出了这些替代方案。
回答by Joel Griffiths
You can also use tr -cd "'[:alnum:] "
你也可以使用 tr -cd "'[:alnum:] "
$ echo "some string '*'@'#'%^ without special chars except '" | tr -cd "'[:alnum:]"
$ somestring''''withoutspecialcharsexcept'
If you want the spaces:
如果你想要空格:
echo "some string '*'@'#'%^ without special chars except '" | tr -cd "'[:alnum:] "
some string '''' without special chars except '