bash 仅在文件末尾使用 grep/sed 递归删除尾随空格?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4727268/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 23:17:28  来源:igfitidea点击:

Remove trailing whitespace recursively only at end of file using grep/sed?

linuxbashshellunixscripting

提问by MALON

Basically, I've got about 1,500 files and the last character of any of these files should not be any type of white space.

基本上,我有大约 1,500 个文件,这些文件的最后一个字符不应是任何类型的空格。

How do I check a bunch of files to make sure that they don't end in some form of whitespace?(newline, space, carriage return, tab, etc.)?

如何检查一堆文件以确保它们不会以某种形式的空格结尾?(换行符、空格、回车、制表符等)?

回答by Paused until further notice.

awk '{if (flag) print line; line = 
sed '/^[[:space:]]*$/{:a;$d;N;/\n[[:space:]]*$/ba}' inputfile |
    awk '{if (flag) print line; line = 
#!/usr/bin/awk -f

# accumulate a run of white-space-only lines so they can be printed or discarded
/^[[:space:]]*$/ {
    accumlines = accumlines nl 
perl -e '$s = ""; while (defined($_ = getc)) { if (/\s/) { $s .= $_; } else { print $s, $_; $s = ""; } }' < infile > outfile
nl = "\n" accum = 1 next } # print the previous line and any accumulated lines, store the current line for the next pass { if (flag) print line if (accum) { print accumlines; accum = 0 } accumlines = nl = "" line =
perl -e 'while (defined($_ = getc)) { $last = $_; } exit($last =~ /\s/);' < infile > outfile
flag = 1 } # print the last line without a trailing newline after removing all trailing whitespace # the resulting output could be null (nothing rather than 0x00) # note that we're not print the accumulated lines since they're part of the # trailing white-space we're trying to get rid of END { gsub("[[:space:]]+$","",line) printf line }
; flag = 1} END {printf line}'
; flag = 1} END {gsub("[[:space:]]+$","",line); printf line}'

Edit:

编辑:

New version:

新版本:

The sedcommand removes all the trailing lines that consist of only whitespace then the awkcommand removes the ending newline.

sed命令删除所有仅包含空格的尾随行,然后该awk命令删除结尾的换行符。

#!/usr/bin/perl
$s = "";
while (defined($_ = getc)) {
    if (/\s/) { $s .= $_; } else { print $s, $_; $s = ""; }
}

The disadvantage is that it reads the file twice.

缺点是它读取文件两次。

Edit 2:

编辑2:

Here's an all-awk solution that only reads the file once. It accumulates white-space-only lines in a manner similar to the sedcommand above.

这是一个仅读取文件一次的全 awk 解决方案。它以类似于上述sed命令的方式累积纯空白行。

find /top/dir -type f -exec sh -c 'mv "{}" "{}.bak" && fix.pl < "{}.bak" > "{}"' ';'

Edit 3:

编辑3:

  • removed unnecessary BEGINclause
  • changed linesto accumlinesso it's easier to distinguish from line(singular)
  • added comments
  • 删除了不必要的BEGIN条款
  • linesaccumlines所以它更容易从区分line(单数)
  • 添加评论

回答by j_random_hacker

This will strip all trailing whitespace:

这将去除所有尾随空格:

tac filename | 
awk '
    /^[[:space:]]*$/ && !seen {next} 
    /[^[:space:]]/   && !seen {gsub(/[[:space:]]+$/,""); seen=1}
    seen
' | 
tac

There's probably an equivalent in sedbut I'm much more familiar with Perl, hope that works for you. Basic idea: if the next character is whitespace, save it; otherwise, print any saved characters followed by the character just read. If we hit EOF after reading one or more whitespace characters, they won't be printed.

可能有一个等价物,sed但我对 Perl 更熟悉,希望对你有用。基本思想:如果下一个字符是空格,则保存;否则,打印任何保存的字符,然后是刚刚读取的字符。如果我们在读取一个或多个空白字符后点击 EOF,它们将不会被打印。

This will simply detect trailing whitespace, giving an exit code of 1 if so:

这将简单地检测尾随空格,如果是这样,则给出退出代码 1:

# command-line arguments are the names of the files to check.
# output is names of files that end with trailing whitespace
for (@ARGV) {
  open F, '<', $_;
  seek F, -1, 2;                # seek to before last char in file
  print "$_\n" if <F> =~ /\s/
}

[EDIT]The above describes how to detect or change a single file. If you have a large directory tree containing files that you want to apply the changes to, you can put the command in a separate script:

[编辑]以上描述了如何检测或更改单个文件。如果您有一个包含要应用更改的文件的大目录树,您可以将命令放在单独的脚本中:

fix.pl

修复文件

ruby -e 's=ARGF.read;s.rstrip!;print s' file

and use it in conjunction with the findcommand:

并将其与find命令结合使用:

# tested on Mac OS X using Bash
while IFS= read -r -d $'
while IFS= read -r -d $'
find /directory/you/want -type f | \ 
xargs --verbose -L 1 sed -n --in-place -r \
':loop;/[^[:space:]\t]/ {p;b;}; N;b loop;'  
' file; do filesize="$(wc -c < "${file}")" while [[ $(tail -c 1 "${file}" | tr -dc '[[:space:]]' | wc -c) -eq 1 ]]; do printf "" | dd of="${file}" seek=$(($filesize - 1)) bs=1 count=1 let filesize-=1 done done < <(find -x "/path/to/dir" -type f -not -empty -print0)
' file; do # remove white space at end of (non-empty) file # note: ed will append final newline if missing printf '%s\n' H '$g/[[:space:]]\{1,\}$/s///g' wq | ed -s "${file}" printf "" | dd of="${file}" seek=$(($(stat -f "%z" "${file}") - 1)) bs=1 count=1 #printf "" | dd of="${file}" seek=$(($(wc -c < "${file}") - 1)) bs=1 count=1 done < <(find -x "/path/to/dir" -type f -not -empty -print0)

This will move each original file to a backup file ending in ".bak". (It would be a good idea to test this on a small test fileset first.)

这会将每个原始文件移动到以“.bak”结尾的备份文件。(最好先在一个小的测试文件集上进行测试。)

回答by glenn Hymanman

Might be easier reading the file from the bottom to the top:

从底部到顶部阅读文件可能更容易:

find /directory/that/holds/your/files -type f | xargs -L 1  sed  -i '' -E 's/[:         :]+$//'

回答by mob

A Perl solution:

Perl 解决方案:

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    int c, bufsize = 100, ns = 0;
    char *buf = malloc(bufsize);

    while ((c = getchar()) != EOF) {
        if (isspace(c)) {
            if (ns == bufsize) buf = realloc(buf, bufsize *= 2);
            buf[ns++] = c;
        } else {
            fwrite(buf, 1, ns, stdout);
            ns = 0;
            putchar(c);
        }
    }

    free(buf);
    return 0;
}

回答by ghostdog74

##代码##

basically, read the whole file, strip the last whitespace if any, and print out the contents. So this solution is not for VERY huge files.

基本上,读取整个文件,去掉最后一个空格(如果有),然后打印出内容。所以这个解决方案不适用于非常大的文件。

回答by yabt

You may also use man edto delete trailing white space at file end and man ddto delete a final newline (although keep in mind that ed reads the whole file into memory and performs an in-place edit without any kind of previous backup):

您还可以使用man ed删除文件末尾的尾随空格并man dd删除最后的换行符(尽管请记住 ed 将整个文件读入内存并执行就地编辑,而无需任何类型的先前备份):

##代码##

回答by yabt

Using man ddwithout man ed:

man dd不使用man ed

##代码##

回答by akond

Version 2. Linux syntax. Proper command.

版本 2。Linux 语法。正确的命令。

##代码##

Version 1. Remove whitespace at the end of each line. FreeBSD syntax.

版本 1. 删除每行末尾的空格。FreeBSD 语法。

##代码##

where the white space in [: :]actually consists of one space and one tab characters. With space it's easy. You just hit the space button. In order to get tab character inserted press Ctrl-V and then Tab in the shell.

其中空白[: :]实际上由一个空格和一个制表符组成。有了空间,这很容易。你只需按空格键。为了插入制表符,请按 Ctrl-V,然后在 shell 中按 Tab。

回答by j_random_hacker

Just for fun, here's a plain C answer:

只是为了好玩,这是一个简单的 C 答案:

##代码##

Not much longer than Dennis's awk solution, and, dare I say, it, easier to understand! :-P

不比丹尼斯的 awk 解决方案长多少,而且,我敢说,它更容易理解!:-P