bash 如何在目录的每个文件中将制表符转换为空格?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11094383/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 22:16:34  来源:igfitidea点击:

How can I convert tabs to spaces in every file of a directory?

bashshellunixspacesin-place

提问by cnd

How can I convert tabs to spaces in every file of a directory (possibly recursively)?

如何将目录的每个文件中的制表符转换为空格(可能是递归的)?

Also, is there a way of setting the number of spaces per tab?

另外,有没有办法设置每个标签的空格数?

采纳答案by Martin Beckett

Warning: This will break your repo.

This will corrupt binary files, including those under svn, .git! Read the comments before using!

警告:这会破坏你的回购。

会损坏二进制文件,包括svn, .git! 使用前请阅读评论!

find . -iname '*.java' -type f -exec sed -i.orig 's/\t/ /g' {} +

find . -iname '*.java' -type f -exec sed -i.orig 's/\t/ /g' {} +

The original file is saved as [filename].orig.

原始文件另存为[filename].orig.

Replace '*.java' with the file ending of the file type you are looking for. This way you can prevent accidental corruption of binary files.

将 '*.java' 替换为您要查找的文件类型的文件结尾。这样您就可以防止二进制文件的意外损坏。

Downsides:

缺点:

  • Will replace tabs everywhere in a file.
  • Will take a long time if you happen to have a 5GB SQL dump in this directory.
  • 将替换文件中所有位置的选项卡。
  • 如果您碰巧在此目录中有 5GB 的 SQL 转储,将需要很长时间。

回答by Gene

Simple replacement with sedis okay but not the best possible solution. If there are "extra" spaces between the tabs they will still be there after substitution, so the margins will be ragged. Tabs expanded in the middle of lines will also not work correctly. In bash, we can say instead

简单的替换sed是可以的,但不是最好的解决方案。如果选项卡之间有“额外”空格,替换后它们仍然存在,因此边距会参差不齐。在行中间展开的选项卡也无法正常工作。在bash,我们可以说

find . -name '*.java' ! -type d -exec bash -c 'expand -t 4 "
expand -i -t 4 input | sponge output
" > /tmp/e && mv /tmp/e "
sudo apt-get install moreutils
# The complete one-liner:
find ./ -iname '*.java' -type f -exec bash -c 'expand -t 4 "
sed -i $'s/\t/-/g' *.txt
" | sponge "
sed -i $'s/\t/ /g' *.txt
"' {} \;
"' {} \;

to apply expandto every Java file in the current directory tree. Remove / replace the -nameargument if you're targeting some other file types. As one of the comments mentions, be very careful when removing -nameor using a weak, wildcard. You can easily clobber repository and other hidden files without intent. This is why the original answer included this:

应用于expand当前目录树中的每个 Java 文件。-name如果您的目标是其他一些文件类型,请删除/替换该参数。正如评论之一所提到的,在删除-name或使用弱通配符时要非常小心。您可以轻松地无意中破坏存储库和其他隐藏文件。这就是为什么原始答案包括这个:

You should always make a backup copy of the tree before trying something like this in case something goes wrong.

在尝试此类操作之前,您应该始终制作树的备份副本,以防出现问题。

回答by kev

Try the command line tool expand.

试试命令行工具expand

sed -i $'s/\t/    /g' *.txt

where

在哪里

  • -iis used to expand only leading tabs on each line;
  • -t 4means that each tab will be converted to 4 whitespace chars (8 by default).
  • spongeis from the moreutilspackage, and avoids clearing the input file.
  • -i用于仅扩展每行的前导选项卡;
  • -t 4意味着每个选项卡将转换为 4 个空白字符(默认为 8 个)。
  • sponge来自moreutils包,并避免清除输入文件

Finally, you can use gexpandon OSX, after installing coreutilswith Homebrew (brew install coreutils).

最后,您可以gexpand在 OSX 上coreutils使用 Homebrew ( brew install coreutils)安装后。

回答by not2qubit

Collecting the best comments from Gene's answer, the best solution by far, is by using spongefrom moreutils.

Gene 的回答中收集最好的评论,迄今为止最好的解决方案是使用spongefrom moreutils

sed -i $'s/\t/-/g' *.txt

Explanation:

解释:

  • ./is recursively searching from current directory
  • -inameis a case insensitive match (for both *.javaand *.JAVAlikes)
  • type -ffinds only regular files (no directories, binaries or symlinks)
  • -exec bash -cexecute following commands in a subshell for each file name, {}
  • expand -t 4expands all TABs to 4 spaces
  • spongesoak up standard input (from expand) and write to a file (the same one)*.
  • ./从当前目录递归搜索
  • -iname是不区分大小写的匹配(对于两者*.java*.JAVA喜欢)
  • type -f仅查找常规文件(无目录、二进制文件或符号链接)
  • -exec bash -c在子shell中为每个文件名执行以下命令, {}
  • expand -t 4将所有 TAB 扩展为 4 个空格
  • sponge吸收标准输入(来自expand)并写入文件(同一个)*。

NOTE: * A simple file redirection (> "$0") won't work here because it would overwrite the file too soon.

注意: * 简单的文件重定向 ( > "$0") 在这里不起作用,因为它会过早覆盖文件

Advantage: All original file permissions are retained and no intermediate tmpfiles are used.

优点:保留所有原始文件权限,不使用中间tmp文件。

回答by e9t

Use backslash-escaped sed.

使用反斜杠转义sed

On linux:

在 Linux 上:

  • Replace all tabs with 1 hyphen inplace, in all *.txt files:

    sed -i $'s/\t/ /g' *.txt
    
  • Replace all tabs with 1 space inplace, in all *.txt files:

    sed -i $'s/\t/    /g' *.txt
    
  • Replace all tabs with 4 spaces inplace, in all *.txt files:

    sed -i '' $'s/\t/    /g' *.txt
    
  • 在所有 *.txt 文件中,用 1 个连字符替换所有选项卡:

    sed -i '' $'s/\t/    /g' *.txt
    
  • 在所有 *.txt 文件中,用 1 个空格替换所有选项卡:

    pr -t -e=4 file > file.expanded
    
  • 在所有 *.txt 文件中,用 4 个空格替换所有制表符:

    #!/bin/bash
    num=4
    shopt -s globstar nullglob
    for f in **/*; do
      [[ -f "$f" ]]   || continue # skip if not a regular file
      ! grep -qI "$f" && continue # skip binary files
      pr -t -e=$num "$f" > "$f.expanded.$$" && mv "$f.expanded.$$" "$f"
    done
    

On a mac:

在 Mac 上:

  • Replace all tabs with 4 spaces inplace, in all *.txt files:

    #!/usr/bin/env python
    #
    # http://code.arp242.net/sanitize_files
    #
    
    import os, re, sys
    
    
    def is_binary(data):
        return data.find(b'
    ls *.java | awk '{print "expand -t 4 ", 
    ls mod/*/*.php | awk '{print "expand -t 4 ", 
    find mod/ -name '*.php' -mindepth 1 -maxdepth 2 | awk '{print "expand -t 4 ", 
    find -type f \( -name '*.css' -o -name '*.html' -o -name '*.js' -o -name '*.php' \) -execdir vim -c retab -c wq {} \;
    
    , " > /tmp/e; mv /tmp/e ", ##代码##}' | sh
    , " > /tmp/e; mv /tmp/e ", ##代码##}' | sh
    , " > /tmp/e; mv /tmp/e ", ##代码##}' | sh -v
    0') >= 0 def should_ignore(path): keep = [ # VCS systems '.git/', '.hg/' '.svn/' 'CVS/', # These files have significant whitespace/tabs, and cannot be edited # safely # TODO: there are probably more of these files.. 'Makefile', 'BSDmakefile', 'GNUmakefile', 'Gemfile.lock' ] for k in keep: if '/%s' % k in path: return True return False def run(files): indent_find = b'\t' indent_replace = b' ' * indent_width for f in files: if should_ignore(f): print('Ignoring %s' % f) continue try: size = os.stat(f).st_size # Unresolvable symlink, just ignore those except FileNotFoundError as exc: print('%s is unresolvable, skipping (%s)' % (f, exc)) continue if size == 0: continue if size > 1024 ** 2: print("Skipping `%s' because it's over 1MiB" % f) continue try: data = open(f, 'rb').read() except (OSError, PermissionError) as exc: print("Error: Unable to read `%s': %s" % (f, exc)) continue if is_binary(data): print("Skipping `%s' because it looks binary" % f) continue data = data.split(b'\n') fixed_indent = False for i, line in enumerate(data): # Fix indentation repl_count = 0 while line.startswith(indent_find): fixed_indent = True repl_count += 1 line = line.replace(indent_find, b'', 1) if repl_count > 0: line = indent_replace * repl_count + line data = list(filter(lambda x: x is not None, data)) try: open(f, 'wb').write(b'\n'.join(data)) except (OSError, PermissionError) as exc: print("Error: Unable to write to `%s': %s" % (f, exc)) if __name__ == '__main__': allfiles = [] for root, dirs, files in os.walk(os.getcwd()): for f in files: p = '%s/%s' % (root, f) if do_add: allfiles.append(p) run(allfiles)
  • 在所有 *.txt 文件中,用 4 个空格替换所有制表符:

    ##代码##

回答by codeforester

You can use the generally available prcommand (man page here). For example, to convert tabs to four spaces, do this:

您可以使用普遍可用的pr命令(此处的手册页)。例如,要将制表符转换为四个空格,请执行以下操作:

##代码##
  • -tsuppresses headers
  • -e=numexpands tabs to numspaces
  • -t抑制标题
  • -e=num将制表符扩展为num空格

To convert all files in a directory tree recursively, while skipping binary files:

要递归地转换目录树中的所有文件,同时跳过二进制文件:

##代码##

The logic for skipping binary files is from this post.

跳过二进制文件的逻辑来自这篇文章

NOTE:

笔记:

  1. Doing this could be dangerous in a git or svn repo
  2. This is not the right solution if you have code files that have tabs embedded in string literals
  1. 在 git 或 svn repo 中这样做可能很危险
  2. 如果您的代码文件在字符串文字中嵌入了制表符,这不是正确的解决方案

回答by Martin Tournoij

How can I convert tabs to spaces in every file of a directory (possibly recursively)?

如何将目录的每个文件中的制表符转换为空格(可能是递归的)?

This is usually notwhat you want.

这通常不是您想要的。

Do you want to do this for png images? PDF files? The .git directory? Your Makefile(which requirestabs)? A 5GB SQL dump?

您想对 png 图像执行此操作吗?PDF文件?.git 目录?你的 Makefile需要标签)?一个 5GB 的 SQL 转储?

You could, in theory, pass a whole lot of exlude options to findor whatever else you're using; but this is fragile, and will break as soon as you add other binary files.

理论上,您可以将大量排除选项传递给find您正在使用的任何其他选项;但这很脆弱,一旦添加其他二进制文件就会损坏。

What you want, is at least:

你想要的,至少是:

  1. Skip files over a certain size.
  2. Detect if a file is binary by checking for the presence of a NULL byte.
  3. Only replace tabs at the startof a file (expanddoes this, seddoesn't).
  1. 跳过特定大小的文件。
  2. 通过检查是否存在 NULL 字节来检测文件是否为二进制文件。
  3. 仅替换文件开头的制表符(expand这样做,sed不这样做)。

As far as I know, there is no "standard" Unix utility that can do this, and it's not very easy to do with a shell one-liner, so a script is needed.

据我所知,没有“标准”的 Unix 实用程序可以做到这一点,而且使用单行 shell 也不是很容易,因此需要一个脚本。

A while ago I created a little script called sanitize_fileswhich does exactly that. It also fixes some other common stuff like replacing \r\nwith \n, adding a trailing \n, etc.

不久前,我创建了一个名为sanitize_files的小脚本 ,它正是这样做的。它还修复了一些其他常见的东西,例如替换\r\n\n,添加尾随\n等。

You can find a simplified script withoutthe extra features and command-line arguments below, but I recommend you use the above script as it's more likely to receive bugfixes and other updated than this post.

您可以在下面找到一个没有额外功能和命令行参数的简化脚本,但我建议您使用上面的脚本,因为它比这篇文章更有可能收到错误修正和其他更新。

I would also like to point out, in response to some of the other answers here, that using shell globbing is nota robust way of doing this, because sooner or later you'll end up with more files than will fit in ARG_MAX(on modern Linux systems it's 128k, which may seem a lot, but sooner or later it's notenough).

我还想指出,为了回应这里的其他一些答案,使用 shell globbing不是一种可靠的方法,因为迟早你会得到比适合的更多的文件ARG_MAX(在现代Linux系统是128K,这可能看起来很多,但它迟早是足够的)。



##代码##

回答by drchuck

I like the "find" example above for the recursive application. To adapt it to be non-recursive, only changing files in the current directory that match a wildcard, the shell glob expansion can be sufficient for small amounts of files:

我喜欢上面递归应用程序的“查找”示例。为了使其成为非递归的,仅更改当前目录中与通配符匹配的文件,shell glob 扩展对于少量文件就足够了:

##代码##

If you want it silent after you trust that it works, just drop the -von the shcommand at the end.

如果你想它沉默之后,你相信它的作品,只是降-vsh末命令。

Of course you can pick any set of files in the first command. For example, list only a particular subdirectory (or directories) in a controlled manner like this:

当然,您可以在第一个命令中选择任何一组文件。例如,以如下受控方式仅列出特定的子目录(或目录):

##代码##

Or in turn run find(1) with some combination of depth parameters etc:

或者反过来运行 find(1) 并结合一些深度参数等:

##代码##

回答by x-yuri

One can use vimfor that:

一个可以vim用于:

##代码##

As Carpetsmoker stated, it will retab according to your vimsettings. And modelines in the files, if any. Also, it will replace tabs not only at the beginning of the lines. Which is not what you generally want. E.g., you might have literals, containing tabs.

正如 Carpetsmoker 所说,它将根据您的vim设置重新调整。以及文件中的模式行(如果有)。此外,它不仅会在行的开头替换制表符。这不是您通常想要的。例如,您可能有包含制表符的文字。

回答by Theo Belaire

I used astyleto re-indent all my C/C++ code after finding mixed tabs and spaces. It also has options to force a particular brace style if you'd like.

我曾经astyle在找到混合制表符和空格后重新缩进所有 C/C++ 代码。如果您愿意,它还可以选择强制使用特定的支撑样式。