bash sed 删除除字母和 ' 之外的所有字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40588240/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 15:23:36  来源:igfitidea点击:

sed to remove all characters except letters and '

bashsed

提问by Jakob

I am using this sed command to strip documents of all their (for me) unnecessary characters.

我正在使用这个 sed 命令来去除所有(对我来说)不必要的字符的文档。

sed 's/[^a-zA-Z]/ /g'

However after mining my data a bit I realized a pretty basic mistake: not including 'cuts all my don'ts into don ts, which sucks.

然而,在稍微挖掘我的数据之后,我意识到一个非常基本的错误:不包括'将我所有的don'ts 切割成don ts,这很糟糕。

So i want to include 'in my regex. I'm still new to this kind of "coding" if I may call it that way, so excuse my newbie mistake or even better, explain it to me!

所以我想包含'在我的正则表达式中。如果我可以这样称呼它,我对这种“编码”仍然是新手,所以请原谅我的新手错误,甚至更好,向我解释一下!

sed 's/[^a-zA-Z']/ /g'this obviously doesn't work

sed 's/[^a-zA-Z']/ /g'这显然不起作用

sed 's/[^a-zA-Z\']/ /g'however this doesn't either, I thought \escapes the '?

sed 's/[^a-zA-Z\']/ /g'然而这也不是,我认为\逃脱了'

回答by Jean-Fran?ois Fabre

Good old double-quotes in action to protect the single quote without any need of escaping:

很好的旧双引号可以保护单引号而无需转义:

sed "s/[^a-zA-Z']/ /g" <<< "don't ... do this"

gives:

给出:

don't     do this

EDIT: your code seems to replace non-letters by space, but your question states otherwise, so I'm giving you the other version, to remove all non-letters/spaces and multiple occurrences of spaces as well (2nd expression).

编辑:你的代码似乎用空格替换了非字母,但你的问题另有说明,所以我给你另一个版本,删除所有非字母/空格和多次出现的空格(第二个表达式)。

sed -e "s/[^ a-zA-Z']//g" -e 's/ \+/ /' <<< "don't ... do this"

result:

结果:

don't do this

EDIT2: alternate solution to be able to keep single quotes (courtesy of Sundeep):

EDIT2:能够保留单引号的替代解决方案(由 Sundeep 提供):

`'s/[^ a-zA-Z\x27]//g'`

Note: I first tried to escape single quotes following the solutions tested hereand none using single quotes worked for me (always prompting for a line continuation) so I came up with those alternatives.

注意:我首先尝试按照此处测试的解决方案转义单引号,但没有使用单引号对我有用(总是提示行继续),所以我想出了这些替代方案。

回答by Joel Griffiths

You can also use tr -cd "'[:alnum:] "

你也可以使用 tr -cd "'[:alnum:] "

$ echo "some string '*'@'#'%^ without special chars except '" | tr -cd "'[:alnum:]"

$ somestring''''withoutspecialcharsexcept'

If you want the spaces:

如果你想要空格:

echo "some string '*'@'#'%^ without special chars except '" | tr -cd "'[:alnum:] "
some string '''' without special chars except '