bash 仅在文件末尾使用 grep/sed 递归删除尾随空格？

Question

提问by MALON

Basically, I've got about 1,500 files and the last character of any of these files should not be any type of white space.

基本上，我有大约 1,500 个文件，这些文件的最后一个字符不应是任何类型的空格。

How do I check a bunch of files to make sure that they don't end in some form of whitespace?(newline, space, carriage return, tab, etc.)?

如何检查一堆文件以确保它们不会以某种形式的空格结尾？（换行符、空格、回车、制表符等）？

Answer 1

回答by Paused until further notice.

awk '{if (flag) print line; line = sed '/^[[:space:]]*$/{:a;$d;N;/\n[[:space:]]*$/ba}' inputfile |
    awk '{if (flag) print line; line = #!/usr/bin/awk -f

# accumulate a run of white-space-only lines so they can be printed or discarded
/^[[:space:]]*$/ {
    accumlines = accumlines nl perl -e '$s = ""; while (defined($_ = getc)) { if (/\s/) { $s .= $_; } else { print $s, $_; $s = ""; } }' < infile > outfile

    nl = "\n"
    accum = 1
    next
}

# print the previous line and any accumulated lines, store the current line for the next pass
{
    if (flag) print line
    if (accum) { print accumlines; accum = 0 }
    accumlines = nl = ""
    line = perl -e 'while (defined($_ = getc)) { $last = $_; } exit($last =~ /\s/);' < infile > outfile

    flag = 1
}

# print the last line without a trailing newline after removing all trailing whitespace
# the resulting output could be null (nothing rather than 0x00)
# note that we're not print the accumulated lines since they're part of the 
# trailing white-space we're trying to get rid of
END {
    gsub("[[:space:]]+$","",line)
    printf line
}
; flag = 1} END {printf line}'
; flag = 1} END {gsub("[[:space:]]+$","",line); printf line}'

Edit:

编辑：

New version:

新版本：

The sedcommand removes all the trailing lines that consist of only whitespace then the awkcommand removes the ending newline.

该sed命令删除所有仅包含空格的尾随行，然后该awk命令删除结尾的换行符。

#!/usr/bin/perl
$s = "";
while (defined($_ = getc)) {
    if (/\s/) { $s .= $_; } else { print $s, $_; $s = ""; }
}

The disadvantage is that it reads the file twice.

缺点是它读取文件两次。

Edit 2:

编辑2：

Here's an all-awk solution that only reads the file once. It accumulates white-space-only lines in a manner similar to the sedcommand above.

这是一个仅读取文件一次的全 awk 解决方案。它以类似于上述sed命令的方式累积纯空白行。

find /top/dir -type f -exec sh -c 'mv "{}" "{}.bak" && fix.pl < "{}.bak" > "{}"' ';'

Edit 3:

编辑3：

removed unnecessary BEGINclause
changed linesto accumlinesso it's easier to distinguish from line(singular)
added comments

删除了不必要的BEGIN条款
改lines到accumlines所以它更容易从区分line（单数）
添加评论

Answer 2

回答by j_random_hacker

This will strip all trailing whitespace:

这将去除所有尾随空格：

tac filename | 
awk '
    /^[[:space:]]*$/ && !seen {next} 
    /[^[:space:]]/   && !seen {gsub(/[[:space:]]+$/,""); seen=1}
    seen
' | 
tac

There's probably an equivalent in sedbut I'm much more familiar with Perl, hope that works for you. Basic idea: if the next character is whitespace, save it; otherwise, print any saved characters followed by the character just read. If we hit EOF after reading one or more whitespace characters, they won't be printed.

可能有一个等价物，sed但我对 Perl 更熟悉，希望对你有用。基本思想：如果下一个字符是空格，则保存；否则，打印任何保存的字符，然后是刚刚读取的字符。如果我们在读取一个或多个空白字符后点击 EOF，它们将不会被打印。

This will simply detect trailing whitespace, giving an exit code of 1 if so:

这将简单地检测尾随空格，如果是这样，则给出退出代码 1：

# command-line arguments are the names of the files to check.
# output is names of files that end with trailing whitespace
for (@ARGV) {
  open F, '<', $_;
  seek F, -1, 2;                # seek to before last char in file
  print "$_\n" if <F> =~ /\s/
}

[EDIT]The above describes how to detect or change a single file. If you have a large directory tree containing files that you want to apply the changes to, you can put the command in a separate script:

[编辑]以上描述了如何检测或更改单个文件。如果您有一个包含要应用更改的文件的大目录树，您可以将命令放在单独的脚本中：

fix.pl

修复文件

ruby -e 's=ARGF.read;s.rstrip!;print s' file

and use it in conjunction with the findcommand:

并将其与find命令结合使用：

# tested on Mac OS X using Bash
while IFS= read -r -d $'while IFS= read -r -d $'find /directory/you/want -type f | \ 
xargs --verbose -L 1 sed -n --in-place -r \
':loop;/[^[:space:]\t]/ {p;b;}; N;b loop;'  
' file; do
   filesize="$(wc -c < "${file}")"
   while [[ $(tail -c 1 "${file}" | tr -dc '[[:space:]]' | wc -c) -eq 1 ]]; do
      printf "" | dd  of="${file}" seek=$(($filesize - 1)) bs=1 count=1
      let filesize-=1
   done
done < <(find -x "/path/to/dir" -type f -not -empty -print0)
' file; do
   # remove white space at end of (non-empty) file
   # note: ed will append final newline if missing
   printf '%s\n' H '$g/[[:space:]]\{1,\}$/s///g' wq | ed -s "${file}"
   printf "" | dd  of="${file}" seek=$(($(stat -f "%z" "${file}") - 1)) bs=1 count=1
   #printf "" | dd  of="${file}" seek=$(($(wc -c < "${file}") - 1)) bs=1 count=1
done < <(find -x "/path/to/dir" -type f -not -empty -print0)

This will move each original file to a backup file ending in ".bak". (It would be a good idea to test this on a small test fileset first.)

这会将每个原始文件移动到以“.bak”结尾的备份文件。（最好先在一个小的测试文件集上进行测试。）

Answer 3

回答by glenn Hymanman

Might be easier reading the file from the bottom to the top:

从底部到顶部阅读文件可能更容易：

find /directory/that/holds/your/files -type f | xargs -L 1  sed  -i '' -E 's/[:         :]+$//'

Answer 4

回答by mob

A Perl solution:

Perl 解决方案：

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    int c, bufsize = 100, ns = 0;
    char *buf = malloc(bufsize);

    while ((c = getchar()) != EOF) {
        if (isspace(c)) {
            if (ns == bufsize) buf = realloc(buf, bufsize *= 2);
            buf[ns++] = c;
        } else {
            fwrite(buf, 1, ns, stdout);
            ns = 0;
            putchar(c);
        }
    }

    free(buf);
    return 0;
}

Answer 5

回答by ghostdog74

##代码##

basically, read the whole file, strip the last whitespace if any, and print out the contents. So this solution is not for VERY huge files.

基本上，读取整个文件，去掉最后一个空格（如果有），然后打印出内容。所以这个解决方案不适用于非常大的文件。

Answer 6

回答by yabt

You may also use man edto delete trailing white space at file end and man ddto delete a final newline (although keep in mind that ed reads the whole file into memory and performs an in-place edit without any kind of previous backup):

您还可以使用man ed删除文件末尾的尾随空格并man dd删除最后的换行符（尽管请记住 ed 将整个文件读入内存并执行就地编辑，而无需任何类型的先前备份）：

##代码##

Answer 7

回答by yabt

Using man ddwithout man ed:

man dd不使用man ed：

##代码##

Answer 8

回答by akond

Version 2. Linux syntax. Proper command.

版本 2。Linux 语法。正确的命令。

##代码##

Version 1. Remove whitespace at the end of each line. FreeBSD syntax.

版本 1. 删除每行末尾的空格。FreeBSD 语法。

##代码##

where the white space in [: :]actually consists of one space and one tab characters. With space it's easy. You just hit the space button. In order to get tab character inserted press Ctrl-V and then Tab in the shell.

其中空白[: :]实际上由一个空格和一个制表符组成。有了空间，这很容易。你只需按空格键。为了插入制表符，请按 Ctrl-V，然后在 shell 中按 Tab。

Answer 9

回答by j_random_hacker

Just for fun, here's a plain C answer:

只是为了好玩，这是一个简单的 C 答案：

##代码##

Not much longer than Dennis's awk solution, and, dare I say, it, easier to understand! :-P

不比丹尼斯的 awk 解决方案长多少，而且，我敢说，它更容易理解！:-P

bash 仅在文件末尾使用 grep/sed 递归删除尾随空格？

提问by MALON

回答by Paused until further notice.

回答by j_random_hacker

fix.pl

修复文件

回答by glenn Hymanman

回答by mob

回答by ghostdog74

回答by yabt

回答by yabt

回答by akond

回答by j_random_hacker

相关推荐

最近更新

标签

bash 仅在文件末尾使用 grep/sed 递归删除尾随空格？

提问by MALON

回答by Paused until further notice.

回答by j_random_hacker

fix.pl

修复文件

回答by glenn Hymanman

回答by mob

回答by ghostdog74

回答by yabt

回答by yabt

回答by akond

回答by j_random_hacker

相关推荐

bash 通过 grepping qstat 输出并将 jobid 发送到 qdel 来删除作业？

bash 设置运行时路径，从vim中的表达式添加目录？

bash mysql 密码弄乱了我的转储

bash 如何修复 Endeca 中的“无法获得锁定”错误？

相关推荐

最近更新

标签