bash 使用 sed 删除文件中的所有注释

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13548716/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 03:51:18  来源:igfitidea点击:

Delete all comments in a file using sed

bashsed

提问by Logick

How would you delete all comments using sed from a file(defined with #) with respect to '#' being in a string?

您将如何使用 sed 从文件(用 # 定义)中删除所有关于 '#' 在字符串中的注释?

Thishelped out a lot except for the string portion.

除了字符串部分,有很大帮助。

回答by beatgammit

If #always means comment, and can appear anywhere on a line (like after some code):

If #always 表示注释,并且可以出现在一行的任何位置(例如在某些代码之后):

sed 's:#.*$::g' <file-name>

If you want to change it in place, add the -iswitch:

如果要原地更改,请添加-i开关:

sed -i 's:#.*$::g' <file-name>

This will delete from any #to the end of the line, ignoring any context. If you use #anywhere where it's not a comment (like in a string), it will delete that too.

这将从任何#到行尾删除,忽略任何上下文。如果你#在任何地方使用它不是注释(比如在字符串中),它也会删除它。

If comments can only start at the beginning of a line, do something like this:

如果注释只能从一行的开头开始,请执行以下操作:

sed 's:^#.*$::g' <file-name>

If they may be preceded by whitespace, but nothing else, do:

如果它们前面可能有空格,但没有其他内容,请执行以下操作:

sed 's:^\s*#.*$::g' <file-name>

These two will be a little safer because they likely won't delete valid usage of #in your code, such as in strings.

这两个会更安全一些,因为它们可能不会删除#代码中的有效用法,例如在字符串中。

Edit:

编辑:

There's not really a nice way of detecting whether something is in a string. I'd use the last two if that would satisfy the constraints of your language.

没有一种很好的方法来检测字符串中是否有内容。如果这会满足您的语言的限制,我会使用最后两个。

The problem with detecting whether you're in a string is that regular expressions can't do everything. There are a few problems:

检测您是否在字符串中的问题在于正则表达式不能做所有事情。有几个问题:

  • Strings can likely span lines
  • A regular expression can't tell the difference between apostrophies and single quotes
  • A regular expression can't match nested quotes (these cases will confuse the regex):

    # "hello there"
    # hello there"
    "# hello there"
    
  • 字符串可能跨行
  • 正则表达式无法区分撇号和单引号之间的区别
  • 正则表达式不能匹配嵌套引号(这些情况会混淆正则表达式):

    # "hello there"
    # hello there"
    "# hello there"
    

If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:

如果双引号是定义字符串的唯一方式,则双引号永远不会出现在注释中,并且字符串不能跨越多行,请尝试以下操作:

sed 's:#[^"]*$::g' <file-name>

That's a lot of pre-conditions, but if they all hold, you're in business. Otherwise, I'm afraid you're SOL, and you'd be better off writing it in something like Python, where you can do more advanced logic.

这是很多先决条件,但如果它们都成立,您就可以开展业务。否则,恐怕你是 SOL,你最好用 Python 之类的东西来编写它,在那里你可以做更高级的逻辑。

回答by potong

This might work for you (GNU sed):

这可能对你有用(GNU sed):

sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\n/;ta;s/\n\([^#]\)/\n/;ta;s/\n.*//' file
  • /#/!bif the line does not contain a #bail out
  • s/^/\n/insert a unique marker (\n)
  • ta;:ajump to a loop label (resets the substitute true/false flag)
  • s/\n$//;tif marker at the end of the line, remove and bail out
  • s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;taif the string following the marker is a quoted one, bump the marker forward of it and loop.
  • s/\n\([^#]\)/\1\n/;taif the character following the marker is not a #, bump the marker forward of it and loop.
  • s/\n.*//the remainder of the line is comment, remove the marker and the rest of line.
  • /#/!b如果该行不包含#保释
  • s/^/\n/插入唯一标记 ( \n)
  • ta;:a跳转到循环标签(重置替代真/假标志)
  • s/\n$//;t如果标记在行尾,请移除并退出
  • s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta如果标记后面的字符串是带引号的字符串,则将标记向前推进并循环。
  • s/\n\([^#]\)/\1\n/;ta如果标记后面的字符不是 a #,则将标记向前推进并循环。
  • s/\n.*//该行的其余部分是注释,删除标记和行的其余部分。

回答by livibetter

Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.

由于提问者没有提供示例输入,我将假设几种情况并且 Bash 是输入文件,因为 bash 被用作问题的标签。

Case 1: entire line is the comment

案例 1:整行是注释

The following should be sufficient enough in most case:

在大多数情况下,以下内容就足够了:

sed '/^\s*#/d' file

It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace), followed by a #, then delete the line by dcommand.

它匹配任何没有或至少有一个前导空白字符(空格、制表符或其他一些,请参阅man isspace),后跟 a#的行,然后通过d命令删除该行。

Any lines like:

任何行,如:

# comment started from beginning.
         # any number of white-space character before
    # or 'quote' in "here"

They will be deleted.

它们将被删除。

But

a="foobar in #comment"

will not be deleted, which is the desired result.

不会被删除,这是想要的结果。

Case 2: comment after actual code

案例2:实际代码后注释

For example:

例如:

if [[ $foo == "#bar" ]]; then # comment here

The comment part can be removed by

评论部分可以通过

sed "s/\s*#*[^\"']*$//" file

[^\"']is used to prevent quoted string confusion, however, it also means that comments with quotations 'or "will not to be removed.

[^\"']用于防止引用字符串混淆,然而,这也意味着带有引号的注释'"将不会被删除。

Final sed

最终 sed

sed "/^\s*#/d;s/\s*#[^\"']*$//" file

回答by jwfearn

To remove comment lines (lines whose first non-whitespace character is #) but notshebang lines (lines whose first characters are #!):

要删除注释行(第一个非空白字符为 的行#)但删除shebang 行(第一个字符为 的行#!):

sed '/^[[:space:]]*#[^!]/d; /#$/d' file

The first argument to sedis a string containing a sed program consisting of two delete-line commands of the form /regex/d. Commands are separated by ;. The first command deletes comment lines but not shebang lines. The second command deletes any remaining empty comment lines. It does not handle trailing comments.

的第一个参数sed是一个字符串,其中包含一个 sed 程序,该程序由两个/regex形式的删除行命令组成/d。命令以;.分隔。第一个命令删除注释行但不删除 shebang 行。第二个命令删除任何剩余的空注释行。它不处理尾随注释。

The last argument to sedis a file to use as input. In Bash, you can also operate on a string variable like this:

to的最后一个参数sed是用作输入的文件。在 Bash 中,您还可以像这样对字符串变量进行操作:

sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${MYSTRING}"

Example:

例子:

# test.sh
S0=$(cat << HERE
#!/usr/bin/env bash
# comment
  # indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
  #
HERE
)
printf "\nBEFORE removal:\n\n${S0}\n\n"
S1=$(sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${S0}")
printf "\nAFTER removal:\n\n${S1}\n\n"

Output:

输出:

$ bash test.sh

BEFORE removal:

#!/usr/bin/env bash
# comment
  # indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
  #    


AFTER removal:

#!/usr/bin/env bash
echo 'FOO' # trailing comment

回答by tripleee

Supposing "being in a string" means "occurs between a pair of quotes, either single or double", the question can be rephrased as "remove everything after the first unquoted #". You can define the quoted strings, in turn, as anything between two quotes, excepting backslashed quotes. As a minor refinement, replace the entire line with everything up through just before the first unquoted #.

假设“在字符串中”意味着“出现在一对引号之间,无论是单引号还是双引号”,这个问题可以改写为“删除第一个未加引号的#之后的所有内容”。您可以依次将带引号的字符串定义为两个引号之间的任何内容,反斜杠引号除外。作为一个小改进,用第一个不带引号的 # 之前的所有内容替换整行。

So we get something like [^\"'#]for the trivial case -- a piece of string which is neither a comment sign, nor a backslash, nor an opening quote. Then we can accept a backslash followed by anything: \\.-- that's not a literal dot, that's a literal backslash, followed by a dot metacharacter which matches any character.

所以我们得到一些类似[^\"'#]的小例子——一段字符串,它既不是注释符号,也不是反斜杠,也不是开头的引号。然后我们可以接受一个反斜杠后跟任何东西:\\.-- 这不是一个文字点,那是一个文字反斜线,后跟一个匹配任何字符的点元字符。

Then we can allow zero or more repetitions of a quoted string. In order to accept either single or double quotes, allow zero or more of each. A quoted string shall be defined as an opening quote, followed by zero or more of either a backslashed arbitrary character, or any character except the closing quote: "\(\\.\|[^\"]\)*"or similarly for single-quoted strings '\(\\.\|[^\']\)*'.

然后我们可以允许引用字符串的零次或多次重复。为了接受单引号或双引号,每个允许零个或多个。带引号的字符串应定义为开引号,后跟零个或多个反斜杠的任意字符,或除闭引号外的任何字符:"\(\\.\|[^\"]\)*"或类似的单引号字符串'\(\\.\|[^\']\)*'

Piecing all of this together, your sedscript could look something like this:

将所有这些拼凑在一起,您的sed脚本可能如下所示:

s/^\([^\"'#]*\|\.\|"\(\.\|[^\"]\)*"\|'\(\.\|[^\']\)*'\)*\)#.*//

But because it needs to be quoted, and both single and double quotes are included in the string, we need one more additional complication. Recall that the shell allows you to glue together strings like "foo"'bar'gets replaced with foobar-- fooin double quotes, and barin single quotes. Thus you can include single quotes by putting them in double quotes adjacent to your single-quoted string -- '"foo"'"'"is "foo"in single quotes next to 'in double quotes, thus "foo"'; and "'can be expressed as '"'adjacent to "'". And so a single-quoted string containing both double quotes foo"'barcan be quoted with 'foo"'adjacent to "'bar"or, perhaps more realistically for this case 'foo"'adjacent to "'"adjacent to another single-quoted string 'bar', yielding 'foo'"'"'bar'.

但是因为它需要被引用,并且单引号和双引号都包含在字符串中,所以我们还需要一个额外的复杂功能。回想一下,shell 允许您将字符串粘合在一起,例如"foo"'bar'用双引号和单引号替换为foobar-- 。因此,你可以通过把它们放在双引号相邻到你的单引号字符串包含单引号-是在单引号旁边在双引号,从而; 并且可以表示为与 相邻。所以既包含双引号单引号的字符串可以被引用靠近或者更现实地对于这种情况毗邻foobar'"foo"'"'""foo"'"foo"'"''"'"'"foo"'bar'foo"'"'bar"'foo"'"'"与另一个单引号字符串相邻'bar',产生'foo'"'"'bar'.

sed 's/^\(\(\.\|[^\#"'"'"']*\|"\(\.\|[^\"]\)*"\|'"'"'\(\.\|[^\'"'"']\)*'"'"'\)*\)#.*//p' file

This was tested on Linux; on other platforms, the seddialect may be slightly different. For example, you may need to omit the backslashes before the grouping and alteration operators.

这是在 Linux 上测试的;在其他平台上,sed方言可能略有不同。例如,您可能需要在分组和更改运算符之前省略反斜杠。

Alas, if you may have multi-line quoted strings, this will not work; sed, by design, only examines one input line at a time. You could build a complex script which collects multiple lines into memory, but by then, switching to e.g. Perl starts to make a lot of sense.

唉,如果你可能有多行引用的字符串,这将不起作用;sed,按照设计,一次只检查一个输入行。您可以构建一个将多行收集到内存中的复杂脚本,但到那时,切换到例如 Perl 开始变得很有意义。

回答by Daniel Martí

As you have pointed out, sed won't work well if any parts of a script look like comments but actually aren't. For example, you could find a # inside a string, or the rather common $#and ${#param}.

正如您所指出的,如果脚本的任何部分看起来像注释但实际上不是,则 sed 将无法正常工作。例如,您可以在字符串中找到 # 或相当常见的$#and ${#param}

I wrote a shell formatter called shfmt, which has a feature to minify code. That includes removing comments, among other things:

我编写了一个名为shfmt的 shell 格式化程序,它具有缩小代码的功能。这包括删除评论等:

$ cat foo.sh
echo $# # inline comment
# lone comment
echo '# this is not a comment'
[mvdan@carbon:12] [0] [/home/mvdan]
$ shfmt -mn foo.sh
echo $#
echo '# this is not a comment'

The parser and printer are Go packages, so if you'd like a custom solution, it should be fairly easy to write a 20-line Go program to remove comments in the exact way that you want.

解析器和打印机是 Go 包,因此如果您想要自定义解决方案,编写一个 20 行的 Go 程序来以您想要的确切方式删除注释应该相当容易。

回答by Harshad Yeola

sed 's:^#\(.*\)$::g' filename

Supposing the lines starts with single # comment, Above command removes all comments from file.

假设这些行以单个 # 注释开头,上面的命令将从文件中删除所有注释。