bash 仅在文件末尾使用 grep/sed 递归删除尾随空格?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4727268/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove trailing whitespace recursively only at end of file using grep/sed?
提问by MALON
Basically, I've got about 1,500 files and the last character of any of these files should not be any type of white space.
基本上,我有大约 1,500 个文件,这些文件的最后一个字符不应是任何类型的空格。
How do I check a bunch of files to make sure that they don't end in some form of whitespace?(newline, space, carriage return, tab, etc.)?
如何检查一堆文件以确保它们不会以某种形式的空格结尾?(换行符、空格、回车、制表符等)?
回答by Paused until further notice.
awk '{if (flag) print line; line = sed '/^[[:space:]]*$/{:a;$d;N;/\n[[:space:]]*$/ba}' inputfile |
awk '{if (flag) print line; line = #!/usr/bin/awk -f
# accumulate a run of white-space-only lines so they can be printed or discarded
/^[[:space:]]*$/ {
accumlines = accumlines nl perl -e '$s = ""; while (defined($_ = getc)) { if (/\s/) { $s .= $_; } else { print $s, $_; $s = ""; } }' < infile > outfile
nl = "\n"
accum = 1
next
}
# print the previous line and any accumulated lines, store the current line for the next pass
{
if (flag) print line
if (accum) { print accumlines; accum = 0 }
accumlines = nl = ""
line = perl -e 'while (defined($_ = getc)) { $last = $_; } exit($last =~ /\s/);' < infile > outfile
flag = 1
}
# print the last line without a trailing newline after removing all trailing whitespace
# the resulting output could be null (nothing rather than 0x00)
# note that we're not print the accumulated lines since they're part of the
# trailing white-space we're trying to get rid of
END {
gsub("[[:space:]]+$","",line)
printf line
}
; flag = 1} END {printf line}'
; flag = 1} END {gsub("[[:space:]]+$","",line); printf line}'
Edit:
编辑:
New version:
新版本:
The sedcommand removes all the trailing lines that consist of only whitespace then the awkcommand removes the ending newline.
该sed命令删除所有仅包含空格的尾随行,然后该awk命令删除结尾的换行符。
#!/usr/bin/perl
$s = "";
while (defined($_ = getc)) {
if (/\s/) { $s .= $_; } else { print $s, $_; $s = ""; }
}
The disadvantage is that it reads the file twice.
缺点是它读取文件两次。
Edit 2:
编辑2:
Here's an all-awk solution that only reads the file once. It accumulates white-space-only lines in a manner similar to the sedcommand above.
这是一个仅读取文件一次的全 awk 解决方案。它以类似于上述sed命令的方式累积纯空白行。
find /top/dir -type f -exec sh -c 'mv "{}" "{}.bak" && fix.pl < "{}.bak" > "{}"' ';'
Edit 3:
编辑3:
- removed unnecessary
BEGINclause - changed
linestoaccumlinesso it's easier to distinguish fromline(singular) - added comments
- 删除了不必要的
BEGIN条款 - 改
lines到accumlines所以它更容易从区分line(单数) - 添加评论
回答by j_random_hacker
This will strip all trailing whitespace:
这将去除所有尾随空格:
tac filename |
awk '
/^[[:space:]]*$/ && !seen {next}
/[^[:space:]]/ && !seen {gsub(/[[:space:]]+$/,""); seen=1}
seen
' |
tac
There's probably an equivalent in sedbut I'm much more familiar with Perl, hope that works for you. Basic idea: if the next character is whitespace, save it; otherwise, print any saved characters followed by the character just read. If we hit EOF after reading one or more whitespace characters, they won't be printed.
可能有一个等价物,sed但我对 Perl 更熟悉,希望对你有用。基本思想:如果下一个字符是空格,则保存;否则,打印任何保存的字符,然后是刚刚读取的字符。如果我们在读取一个或多个空白字符后点击 EOF,它们将不会被打印。
This will simply detect trailing whitespace, giving an exit code of 1 if so:
这将简单地检测尾随空格,如果是这样,则给出退出代码 1:
# command-line arguments are the names of the files to check.
# output is names of files that end with trailing whitespace
for (@ARGV) {
open F, '<', $_;
seek F, -1, 2; # seek to before last char in file
print "$_\n" if <F> =~ /\s/
}
[EDIT]The above describes how to detect or change a single file. If you have a large directory tree containing files that you want to apply the changes to, you can put the command in a separate script:
[编辑]以上描述了如何检测或更改单个文件。如果您有一个包含要应用更改的文件的大目录树,您可以将命令放在单独的脚本中:
fix.pl
修复文件
ruby -e 's=ARGF.read;s.rstrip!;print s' file
and use it in conjunction with the findcommand:
并将其与find命令结合使用:
# tested on Mac OS X using Bash
while IFS= read -r -d $'while IFS= read -r -d $'find /directory/you/want -type f | \
xargs --verbose -L 1 sed -n --in-place -r \
':loop;/[^[:space:]\t]/ {p;b;}; N;b loop;'
' file; do
filesize="$(wc -c < "${file}")"
while [[ $(tail -c 1 "${file}" | tr -dc '[[:space:]]' | wc -c) -eq 1 ]]; do
printf "" | dd of="${file}" seek=$(($filesize - 1)) bs=1 count=1
let filesize-=1
done
done < <(find -x "/path/to/dir" -type f -not -empty -print0)
' file; do
# remove white space at end of (non-empty) file
# note: ed will append final newline if missing
printf '%s\n' H '$g/[[:space:]]\{1,\}$/s///g' wq | ed -s "${file}"
printf "" | dd of="${file}" seek=$(($(stat -f "%z" "${file}") - 1)) bs=1 count=1
#printf "" | dd of="${file}" seek=$(($(wc -c < "${file}") - 1)) bs=1 count=1
done < <(find -x "/path/to/dir" -type f -not -empty -print0)
This will move each original file to a backup file ending in ".bak". (It would be a good idea to test this on a small test fileset first.)
这会将每个原始文件移动到以“.bak”结尾的备份文件。(最好先在一个小的测试文件集上进行测试。)
回答by glenn Hymanman
Might be easier reading the file from the bottom to the top:
从底部到顶部阅读文件可能更容易:
find /directory/that/holds/your/files -type f | xargs -L 1 sed -i '' -E 's/[: :]+$//'
回答by mob
A Perl solution:
Perl 解决方案:
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
int main(int argc, char **argv) {
int c, bufsize = 100, ns = 0;
char *buf = malloc(bufsize);
while ((c = getchar()) != EOF) {
if (isspace(c)) {
if (ns == bufsize) buf = realloc(buf, bufsize *= 2);
buf[ns++] = c;
} else {
fwrite(buf, 1, ns, stdout);
ns = 0;
putchar(c);
}
}
free(buf);
return 0;
}
回答by ghostdog74
basically, read the whole file, strip the last whitespace if any, and print out the contents. So this solution is not for VERY huge files.
基本上,读取整个文件,去掉最后一个空格(如果有),然后打印出内容。所以这个解决方案不适用于非常大的文件。
回答by yabt
You may also use man edto delete trailing white space at file end and man ddto delete a final newline (although keep in mind that ed reads the whole file into memory and performs an in-place edit without any kind of previous backup):
您还可以使用man ed删除文件末尾的尾随空格并man dd删除最后的换行符(尽管请记住 ed 将整个文件读入内存并执行就地编辑,而无需任何类型的先前备份):
回答by yabt
Using man ddwithout man ed:
man dd不使用man ed:
回答by akond
Version 2. Linux syntax. Proper command.
版本 2。Linux 语法。正确的命令。
##代码##Version 1. Remove whitespace at the end of each line. FreeBSD syntax.
版本 1. 删除每行末尾的空格。FreeBSD 语法。
##代码##where the white space in [: :]actually consists of one space and one tab characters.
With space it's easy. You just hit the space button. In order to get tab character inserted press Ctrl-V and then Tab in the shell.
其中空白[: :]实际上由一个空格和一个制表符组成。有了空间,这很容易。你只需按空格键。为了插入制表符,请按 Ctrl-V,然后在 shell 中按 Tab。
回答by j_random_hacker
Just for fun, here's a plain C answer:
只是为了好玩,这是一个简单的 C 答案:
##代码##Not much longer than Dennis's awk solution, and, dare I say, it, easier to understand! :-P
不比丹尼斯的 awk 解决方案长多少,而且,我敢说,它更容易理解!:-P

