bash 如何在目录的每个文件中将制表符转换为空格?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11094383/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I convert tabs to spaces in every file of a directory?
提问by cnd
How can I convert tabs to spaces in every file of a directory (possibly recursively)?
如何将目录的每个文件中的制表符转换为空格(可能是递归的)?
Also, is there a way of setting the number of spaces per tab?
另外,有没有办法设置每个标签的空格数?
采纳答案by Martin Beckett
Warning: This will break your repo.
This will corrupt binary files, including those under
svn
,.git
! Read the comments before using!
警告:这会破坏你的回购。
这会损坏二进制文件,包括
svn
,.git
! 使用前请阅读评论!
find . -iname '*.java' -type f -exec sed -i.orig 's/\t/ /g' {} +
find . -iname '*.java' -type f -exec sed -i.orig 's/\t/ /g' {} +
The original file is saved as [filename].orig
.
原始文件另存为[filename].orig
.
Replace '*.java' with the file ending of the file type you are looking for. This way you can prevent accidental corruption of binary files.
将 '*.java' 替换为您要查找的文件类型的文件结尾。这样您就可以防止二进制文件的意外损坏。
Downsides:
缺点:
- Will replace tabs everywhere in a file.
- Will take a long time if you happen to have a 5GB SQL dump in this directory.
- 将替换文件中所有位置的选项卡。
- 如果您碰巧在此目录中有 5GB 的 SQL 转储,将需要很长时间。
回答by Gene
Simple replacement with sed
is okay but not the best possible solution. If there are "extra" spaces between the tabs they will still be there after substitution, so the margins will be ragged. Tabs expanded in the middle of lines will also not work correctly. In bash
, we can say instead
简单的替换sed
是可以的,但不是最好的解决方案。如果选项卡之间有“额外”空格,替换后它们仍然存在,因此边距会参差不齐。在行中间展开的选项卡也无法正常工作。在bash
,我们可以说
find . -name '*.java' ! -type d -exec bash -c 'expand -t 4 "expand -i -t 4 input | sponge output
" > /tmp/e && mv /tmp/e "sudo apt-get install moreutils
# The complete one-liner:
find ./ -iname '*.java' -type f -exec bash -c 'expand -t 4 "sed -i $'s/\t/-/g' *.txt
" | sponge "sed -i $'s/\t/ /g' *.txt
"' {} \;
"' {} \;
to apply expand
to every Java file in the current directory tree. Remove / replace the -name
argument if you're targeting some other file types. As one of the comments mentions, be very careful when removing -name
or using a weak, wildcard. You can easily clobber repository and other hidden files without intent. This is why the original answer included this:
应用于expand
当前目录树中的每个 Java 文件。-name
如果您的目标是其他一些文件类型,请删除/替换该参数。正如评论之一所提到的,在删除-name
或使用弱通配符时要非常小心。您可以轻松地无意中破坏存储库和其他隐藏文件。这就是为什么原始答案包括这个:
You should always make a backup copy of the tree before trying something like this in case something goes wrong.
在尝试此类操作之前,您应该始终制作树的备份副本,以防出现问题。
回答by kev
Try the command line tool expand
.
试试命令行工具expand
。
sed -i $'s/\t/ /g' *.txt
where
在哪里
-i
is used to expand only leading tabs on each line;-t 4
means that each tab will be converted to 4 whitespace chars (8 by default).sponge
is from themoreutils
package, and avoids clearing the input file.
Finally, you can use gexpand
on OSX, after installing coreutils
with Homebrew (brew install coreutils
).
最后,您可以gexpand
在 OSX 上coreutils
使用 Homebrew ( brew install coreutils
)安装后。
回答by not2qubit
Collecting the best comments from Gene's answer, the best solution by far, is by using sponge
from moreutils.
从Gene 的回答中收集最好的评论,迄今为止最好的解决方案是使用sponge
from moreutils。
sed -i $'s/\t/-/g' *.txt
Explanation:
解释:
./
is recursively searching from current directory-iname
is a case insensitive match (for both*.java
and*.JAVA
likes)type -f
finds only regular files (no directories, binaries or symlinks)-exec bash -c
execute following commands in a subshell for each file name,{}
expand -t 4
expands all TABs to 4 spacessponge
soak up standard input (fromexpand
) and write to a file (the same one)*.
./
从当前目录递归搜索-iname
是不区分大小写的匹配(对于两者*.java
和*.JAVA
喜欢)type -f
仅查找常规文件(无目录、二进制文件或符号链接)-exec bash -c
在子shell中为每个文件名执行以下命令,{}
expand -t 4
将所有 TAB 扩展为 4 个空格sponge
吸收标准输入(来自expand
)并写入文件(同一个)*。
NOTE: * A simple file redirection (> "$0"
) won't work here because it would overwrite the file too soon.
注意: * 简单的文件重定向 ( > "$0"
) 在这里不起作用,因为它会过早覆盖文件。
Advantage: All original file permissions are retained and no intermediate tmp
files are used.
优点:保留所有原始文件权限,不使用中间tmp
文件。
回答by e9t
Use backslash-escaped sed
.
使用反斜杠转义sed
。
On linux:
在 Linux 上:
Replace all tabs with 1 hyphen inplace, in all *.txt files:
sed -i $'s/\t/ /g' *.txt
Replace all tabs with 1 space inplace, in all *.txt files:
sed -i $'s/\t/ /g' *.txt
Replace all tabs with 4 spaces inplace, in all *.txt files:
sed -i '' $'s/\t/ /g' *.txt
在所有 *.txt 文件中,用 1 个连字符替换所有选项卡:
sed -i '' $'s/\t/ /g' *.txt
在所有 *.txt 文件中,用 1 个空格替换所有选项卡:
pr -t -e=4 file > file.expanded
在所有 *.txt 文件中,用 4 个空格替换所有制表符:
#!/bin/bash num=4 shopt -s globstar nullglob for f in **/*; do [[ -f "$f" ]] || continue # skip if not a regular file ! grep -qI "$f" && continue # skip binary files pr -t -e=$num "$f" > "$f.expanded.$$" && mv "$f.expanded.$$" "$f" done
On a mac:
在 Mac 上:
Replace all tabs with 4 spaces inplace, in all *.txt files:
#!/usr/bin/env python # # http://code.arp242.net/sanitize_files # import os, re, sys def is_binary(data): return data.find(b'
0') >= 0 def should_ignore(path): keep = [ # VCS systems '.git/', '.hg/' '.svn/' 'CVS/', # These files have significant whitespace/tabs, and cannot be edited # safely # TODO: there are probably more of these files.. 'Makefile', 'BSDmakefile', 'GNUmakefile', 'Gemfile.lock' ] for k in keep: if '/%s' % k in path: return True return False def run(files): indent_find = b'\t' indent_replace = b' ' * indent_width for f in files: if should_ignore(f): print('Ignoring %s' % f) continue try: size = os.stat(f).st_size # Unresolvable symlink, just ignore those except FileNotFoundError as exc: print('%s is unresolvable, skipping (%s)' % (f, exc)) continue if size == 0: continue if size > 1024 ** 2: print("Skipping `%s' because it's over 1MiB" % f) continue try: data = open(f, 'rb').read() except (OSError, PermissionError) as exc: print("Error: Unable to read `%s': %s" % (f, exc)) continue if is_binary(data): print("Skipping `%s' because it looks binary" % f) continue data = data.split(b'\n') fixed_indent = False for i, line in enumerate(data): # Fix indentation repl_count = 0 while line.startswith(indent_find): fixed_indent = True repl_count += 1 line = line.replace(indent_find, b'', 1) if repl_count > 0: line = indent_replace * repl_count + line data = list(filter(lambda x: x is not None, data)) try: open(f, 'wb').write(b'\n'.join(data)) except (OSError, PermissionError) as exc: print("Error: Unable to write to `%s': %s" % (f, exc)) if __name__ == '__main__': allfiles = [] for root, dirs, files in os.walk(os.getcwd()): for f in files: p = '%s/%s' % (root, f) if do_add: allfiles.append(p) run(allfiles)ls *.java | awk '{print "expand -t 4 ",
, " > /tmp/e; mv /tmp/e ", ##代码##}' | sh -vls mod/*/*.php | awk '{print "expand -t 4 ",
, " > /tmp/e; mv /tmp/e ", ##代码##}' | shfind mod/ -name '*.php' -mindepth 1 -maxdepth 2 | awk '{print "expand -t 4 ",
, " > /tmp/e; mv /tmp/e ", ##代码##}' | shfind -type f \( -name '*.css' -o -name '*.html' -o -name '*.js' -o -name '*.php' \) -execdir vim -c retab -c wq {} \;
在所有 *.txt 文件中,用 4 个空格替换所有制表符:
##代码##
回答by codeforester
You can use the generally available pr
command (man page here). For example, to convert tabs to four spaces, do this:
您可以使用普遍可用的pr
命令(此处的手册页)。例如,要将制表符转换为四个空格,请执行以下操作:
-t
suppresses headers-e=num
expands tabs tonum
spaces
-t
抑制标题-e=num
将制表符扩展为num
空格
To convert all files in a directory tree recursively, while skipping binary files:
要递归地转换目录树中的所有文件,同时跳过二进制文件:
##代码##The logic for skipping binary files is from this post.
跳过二进制文件的逻辑来自这篇文章。
NOTE:
笔记:
- Doing this could be dangerous in a git or svn repo
- This is not the right solution if you have code files that have tabs embedded in string literals
- 在 git 或 svn repo 中这样做可能很危险
- 如果您的代码文件在字符串文字中嵌入了制表符,这不是正确的解决方案
回答by Martin Tournoij
How can I convert tabs to spaces in every file of a directory (possibly recursively)?
如何将目录的每个文件中的制表符转换为空格(可能是递归的)?
This is usually notwhat you want.
这通常不是您想要的。
Do you want to do this for png images? PDF files? The .git directory? Your
Makefile
(which requirestabs)? A 5GB SQL dump?
您想对 png 图像执行此操作吗?PDF文件?.git 目录?你的
Makefile
(需要标签)?一个 5GB 的 SQL 转储?
You could, in theory, pass a whole lot of exlude options to find
or whatever
else you're using; but this is fragile, and will break as soon as you add other
binary files.
理论上,您可以将大量排除选项传递给find
您正在使用的任何其他选项;但这很脆弱,一旦添加其他二进制文件就会损坏。
What you want, is at least:
你想要的,至少是:
- Skip files over a certain size.
- Detect if a file is binary by checking for the presence of a NULL byte.
- Only replace tabs at the startof a file (
expand
does this,sed
doesn't).
- 跳过特定大小的文件。
- 通过检查是否存在 NULL 字节来检测文件是否为二进制文件。
- 仅替换文件开头的制表符(
expand
这样做,sed
不这样做)。
As far as I know, there is no "standard" Unix utility that can do this, and it's not very easy to do with a shell one-liner, so a script is needed.
据我所知,没有“标准”的 Unix 实用程序可以做到这一点,而且使用单行 shell 也不是很容易,因此需要一个脚本。
A while ago I created a little script called
sanitize_fileswhich does exactly
that. It also fixes some other common stuff like replacing \r\n
with \n
,
adding a trailing \n
, etc.
不久前,我创建了一个名为sanitize_files的小脚本
,它正是这样做的。它还修复了一些其他常见的东西,例如替换\r\n
为\n
,添加尾随\n
等。
You can find a simplified script withoutthe extra features and command-line arguments below, but I recommend you use the above script as it's more likely to receive bugfixes and other updated than this post.
您可以在下面找到一个没有额外功能和命令行参数的简化脚本,但我建议您使用上面的脚本,因为它比这篇文章更有可能收到错误修正和其他更新。
I would also like to point out, in response to some of the other answers here,
that using shell globbing is nota robust way of doing this, because sooner
or later you'll end up with more files than will fit in ARG_MAX
(on modern
Linux systems it's 128k, which may seem a lot, but sooner or later it's notenough).
我还想指出,为了回应这里的其他一些答案,使用 shell globbing不是一种可靠的方法,因为迟早你会得到比适合的更多的文件ARG_MAX
(在现代Linux系统是128K,这可能看起来很多,但它迟早是不足够的)。
##代码##
回答by drchuck
I like the "find" example above for the recursive application. To adapt it to be non-recursive, only changing files in the current directory that match a wildcard, the shell glob expansion can be sufficient for small amounts of files:
我喜欢上面递归应用程序的“查找”示例。为了使其成为非递归的,仅更改当前目录中与通配符匹配的文件,shell glob 扩展对于少量文件就足够了:
##代码##If you want it silent after you trust that it works, just drop the -v
on the sh
command at the end.
如果你想它沉默之后,你相信它的作品,只是降-v
的sh
末命令。
Of course you can pick any set of files in the first command. For example, list only a particular subdirectory (or directories) in a controlled manner like this:
当然,您可以在第一个命令中选择任何一组文件。例如,以如下受控方式仅列出特定的子目录(或目录):
##代码##Or in turn run find(1) with some combination of depth parameters etc:
或者反过来运行 find(1) 并结合一些深度参数等:
##代码##回答by x-yuri
One can use vim
for that:
一个可以vim
用于:
As Carpetsmoker stated, it will retab according to your vim
settings. And modelines in the files, if any. Also, it will replace tabs not only at the beginning of the lines. Which is not what you generally want. E.g., you might have literals, containing tabs.
正如 Carpetsmoker 所说,它将根据您的vim
设置重新调整。以及文件中的模式行(如果有)。此外,它不仅会在行的开头替换制表符。这不是您通常想要的。例如,您可能有包含制表符的文字。