bash 使用 sed 删除标点符号和制表符

Question

提问by I0_ol

I am using the following to remove punctuation, tabs, and convert uppercase text to lowercase in a text file.

我正在使用以下内容删除标点符号、制表符，并将文本文件中的大写文本转换为小写。

sed 's/[[:punct:]]//g' $HOME/file.txt | sed $'s/\t//g' | tr '[:upper:]' '[:lower:]'

Do I need to use these two separate sedcommands to remove punctuation and tabs or can this be done with a single sedcommand?

我是否需要使用这两个单独的sed命令来删除标点符号和制表符，还是可以使用单个sed命令完成？

Also, could someone explain what the $is doing in the second sedcommand? Without it the command doesn't remove tabs. I looked in the man page but I didn't see anything that mentioned this.

另外，有人可以解释$第二个sed命令中正在做什么吗？没有它，该命令不会删除选项卡。我查看了手册页，但没有看到任何提到这一点的内容。

The input file looks like this:

输入文件如下所示：

Pochemu oni ne v shkole?
Kto tam?
Otkuda eto moloko?
Chei chai ona p'et?
    Kogda vy chitaete?
    Kogda ty chitaesh'?

Answer 1

回答by Inian

A single sedwith multiple -eexpressions, which can be done as below for FreeBSD sed

一个sed带有多个-e表达式的单个，可以按如下方式完成FreeBSD sed

sed -e $'s/\t//g' -e "s/[[:punct:]]\+//g" -e 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/' file

With the yquanitifier for,

使用y限定符 for，

[2addr]y/string1/string2/
      Replace all occurrences of characters in string1 in the pattern 
      space with the corresponding characters from string2.

If in GNUsed, \Lquantifier for lower-case conversion should work fine.

如果在GNUsed 中，\L小写转换的量词应该可以正常工作。

sed -e $'s/\t//g' -e "s/[[:punct:]]\+//g" -e "s/./\L&/g"

$''is a bashquoting mechanism to enable ANSI C-like escape sequences.

$''是一种bash引用机制，用于启用ANSI C 类转义序列。

bash 使用 sed 删除标点符号和制表符

提问by I0_ol

回答by Inian

相关推荐

最近更新

标签

bash 使用 sed 删除标点符号和制表符

提问by I0_ol

回答by Inian

相关推荐

bash 在bash脚本中减去两个时间戳

bash BASH中“${1#*-}”是什么意思

为什么可以在 Bash 函数中设置环境变量，而不能在脚本本身中设置

bash .bash_profile 语法错误：文件意外结束

相关推荐

最近更新

标签