从 Bash 中的字符串中删除所有特殊字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36926999/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Removing all special characters from a string in Bash
提问by Marta Koprivnik
I have a lot of text in lowercase, only problem is, that there is a lot of special characters, which I want to remove it all with numbers too.
我有很多小写文本,唯一的问题是,有很多特殊字符,我也想用数字将其全部删除。
Next command it's not strong enough:
下一个命令它不够强大:
tr -cd '[alpha]\n '
In case of é???? and some others it returns "?" But I want to remove all of them. Is there any stronger command?
在é的情况下????和其他一些它返回“?” 但我想删除所有这些。有没有更强的命令?
I use linux mint 4.3.8(1)-release
我使用 linux mint 4.3.8(1)-release
回答by Inian
You can use tr
to print only the printable characters from a string like below. Just use the below command on your input file.
您可以使用tr
仅打印字符串中的可打印字符,如下所示。只需在输入文件上使用以下命令。
tr -cd "[:print:]\n" < file1
The flag -d
is meant to the delete the character sets defined in the arguments on the input stream, and -c
is for complementing those (invert what's provided). So without -c
the command would delete all printable characters from the input stream and using it complements it by removing the non-printablecharacters. We also keep the newline character \n
to preserve the line endings in the input file. Removing it would just produce the final output in one big line.
该标志-d
旨在删除输入流参数中定义的字符集,并-c
用于补充这些字符集(反转提供的内容)。因此,如果没有-c
该命令,则会从输入流中删除所有可打印字符,并通过删除不可打印字符来使用它来补充它。我们还保留换行符\n
以保留输入文件中的行尾。删除它只会在一条大线上产生最终输出。
The [:print:]
is just a POSIX bracket expressionwhich is a combination of expressions [:alnum:]
, [:punct:]
and space. The [:alnum:]
is same as [0-9A-Za-z]
and [:punct:]
includes characters !
"
#
$
%
&
'
(
)
*
+
,
-
.
/
:
;
<
=
>
?
@
[
\
]
^
_
`
{
|
}
~
的[:print:]
仅仅是一个POSIX括号表达式是表达式的组合[:alnum:]
,[:punct:]
和空间。的[:alnum:]
是相同的[0-9A-Za-z]
,并[:punct:]
包括字符!
"
#
$
%
&
'
(
)
*
+
,
-
.
/
:
;
<
=
>
?
@
[
\
]
^
_
`
{
|
}
~
回答by John Mark Mitchell
I am not exactly certain where the text is coming from in your question but lets just say that the "lot of text in lowercase" is in the file called special.txt
you could do something like the following but focused more on the characters you want to keep:
我不确定您的问题中的文本来自何处,但可以说“大量小写文本”在名为special.txt
您可以执行以下操作的文件中,但更多地关注您想要保留的字符:
cat special.txt | sed 's/[^a-z A-Z]//g'
It is a bit like doing surgery with an axe though.
不过这有点像用斧头做手术。
Another possible solution in the post Remove non-ascii characters from ...
帖子中的另一种可能的解决方案从...中删除非ascii字符
If the above don't solve your question, please try to provide a bit more details and I might be able to provide a more actionable answer.
如果以上内容不能解决您的问题,请尝试提供更多详细信息,我可能会提供更可行的答案。
回答by Cheema Jaspreet Singh
Just wanted to add my bit to it. The code below will do a better job of getting rid of all characters as explained above and will replace them with space and preserve your newline character at the same time
只是想添加我的一点。下面的代码将更好地去除上述所有字符,并将用空格替换它们并同时保留换行符
tr -s "[:punct:]" " "
From Manual Entry -s
从手动输入 -s
Squeeze multiple occurrences of the characters listed in the last operand (either string1 or string2) in the input into a single instance of the character. This occurs after all deletion and translation is completed.
将输入中最后一个操作数(string1 或 string2)中列出的多个字符压缩到单个字符实例中。这是在所有删除和翻译完成后发生的。