bash 使用 sed 删除标点符号和制表符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42108365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Removing punctuation and tabs with sed
提问by I0_ol
I am using the following to remove punctuation, tabs, and convert uppercase text to lowercase in a text file.
我正在使用以下内容删除标点符号、制表符,并将文本文件中的大写文本转换为小写。
sed 's/[[:punct:]]//g' $HOME/file.txt | sed $'s/\t//g' | tr '[:upper:]' '[:lower:]'
Do I need to use these two separate sed
commands to remove punctuation and tabs or can this be done with a single sed
command?
我是否需要使用这两个单独的sed
命令来删除标点符号和制表符,还是可以使用单个sed
命令完成?
Also, could someone explain what the $
is doing in the second sed
command? Without it the command doesn't remove tabs. I looked in the man page but I didn't see anything that mentioned this.
另外,有人可以解释$
第二个sed
命令中正在做什么吗?没有它,该命令不会删除选项卡。我查看了手册页,但没有看到任何提到这一点的内容。
The input file looks like this:
输入文件如下所示:
Pochemu oni ne v shkole?
Kto tam?
Otkuda eto moloko?
Chei chai ona p'et?
Kogda vy chitaete?
Kogda ty chitaesh'?
回答by Inian
A single sed
with multiple -e
expressions, which can be done as below for FreeBSD sed
一个sed
带有多个-e
表达式的单个,可以按如下方式完成FreeBSD sed
sed -e $'s/\t//g' -e "s/[[:punct:]]\+//g" -e 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/' file
With the y
quanitifier for,
使用y
限定符 for,
[2addr]y/string1/string2/
Replace all occurrences of characters in string1 in the pattern
space with the corresponding characters from string2.
If in GNU
sed, \L
quantifier for lower-case conversion should work fine.
如果在GNU
sed 中,\L
小写转换的量词应该可以正常工作。
sed -e $'s/\t//g' -e "s/[[:punct:]]\+//g" -e "s/./\L&/g"
$''
is a bash
quoting mechanism to enable ANSI C-like escape sequences.
$''
是一种bash
引用机制,用于启用ANSI C 类转义序列。