bash“映射”等效：在每个文件上运行命令

Question

提问by Claudiu

I often have a command that processes one file, and I want to run it on every file in a directory. Is there any built-in way to do this?

我经常有一个处理一个文件的命令，我想在目录中的每个文件上运行它。有没有内置的方法来做到这一点？

For example, say I have a program datawhich outputs an important number about a file:

例如，假设我有一个程序data输出关于文件的重要数字：

./data foo
137
./data bar
42

I want to run it on every file in the directory in some manner like this:

我想以这样的方式在目录中的每个文件上运行它：

map data `ls *`
ls * | map data

to yield output like this:

产生这样的输出：

foo: 137
bar: 42

Answer 1

回答by Daniel Haley

If you are just trying to execute your dataprogram on a bunch of files, the easiest/least complicated way is to use -execin find.

如果你只是想data在一堆文件上执行你的程序，最简单/最不复杂的方法是-exec在find.

Say you wanted to execute dataon all txt files in the current directory (and subdirectories). This is all you'd need:

假设您想对data当前目录（和子目录）中的所有 txt 文件执行。这就是你所需要的：

find . -name "*.txt" -exec data {} \;

If you wanted to restrict it to the current directory, you could do this:

如果你想将它限制在当前目录，你可以这样做：

find . -maxdepth 1 -name "*.txt" -exec data {} \;

There are lots of options with find.

有很多选择find。

Answer 2

回答by Mark Byers

If you just want to run a command on every file you can do this:

如果您只想对每个文件运行一个命令，您可以这样做：

for i in *; do data "$i"; done

If you also wish to display the filename that it is currently working on then you could use this:

如果您还希望显示当前正在处理的文件名，则可以使用以下命令：

for i in *; do echo -n "$i: "; data "$i"; done

Answer 3

回答by Stephen

It looks like you want xargs:

看起来你想要xargs：

find . --maxdepth 1 | xargs -d'\n' data

To print each command first, it gets a little more complex:

首先打印每个命令，它变得有点复杂：

find . --maxdepth 1 | xargs -d'\n' -I {} bash -c "echo {}; data {}"

Answer 4

回答by Paused until further notice.

You should avoid parsing ls:

你应该避免解析ls：

find . -maxdepth 1 | while read -r file; do do_something_with "$file"; done

or

或者

while read -r file; do do_something_with "$file"; done < <(find . -maxdepth 1)

The latter doesn't create a subshell out of the while loop.

后者不会在 while 循环之外创建子外壳。

Answer 5

回答by Cascabel

The common methods are:

常用的方法有：

ls * | while read file; do data "$file"; done

for file in *; do data "$file"; done

The second can run into problems if you have whitespace in filenames; in that case you'd probably want to make sure it runs in a subshell, and set IFS:

如果文件名中有空格，第二个可能会遇到问题；在这种情况下，您可能希望确保它在子 shell 中运行，并设置 IFS：

( IFS=$'\n'; for file in *; do data "$file"; done )

You can easily wrap the first one up in a script:

您可以轻松地将第一个包装在脚本中：

#!/bin/bash
# map.bash

while read file; do
    "" "$file"
done

which can be executed as you requested - just be careful never to accidentally execute anything dumb with it. The benefit of using a looping construct is that you can easily place multiple commands inside it as part of a one-liner, unlike xargs where you'll have to place them in an executable script for it to run.

可以按照您的要求执行 - 只是要小心，不要不小心用它执行任何愚蠢的操作。使用循环结构的好处是您可以轻松地将多个命令作为单行的一部分放置在其中，这与 xargs 不同，在 xargs 中，您必须将它们放置在可执行脚本中才能运行。

Of course, you can also just use the utility xargs:

当然，您也可以只使用该实用程序xargs：

find -maxdepth 0 * | xargs -n 1 data

Note that you should make sure indicators are turned off (ls --indicator-style=none) if you normally use them, or the @appended to symlinks will turn them into nonexistent filenames.

请注意，ls --indicator-style=none如果您通常使用指示器，则应确保关闭 ( )指示器，否则@附加到符号链接会将它们变成不存在的文件名。

Answer 6

回答by Ole Tange

GNU Parallel specializes in making these kind of mappings:

GNU Parallel 专门制作这些类型的映射：

parallel data ::: *

It will run one job on each CPU core in parallel.

它将在每个 CPU 内核上并行运行一项作业。

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

GNU Parallel 是一个通用的并行器，可以很容易地在同一台机器或您可以 ssh 访问的多台机器上并行运行作业。

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

如果您有 32 个不同的作业要在 4 个 CPU 上运行，一个直接的并行化方法是在每个 CPU 上运行 8 个作业：

Simple scheduling

简单的调度

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel 会在完成后生成一个新进程 - 保持 CPU 处于活动状态，从而节省时间：

GNU Parallel scheduling

GNU 并行调度

Installation

安装

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

如果没有为您的发行版打包 GNU Parallel，您可以进行个人安装，这不需要 root 访问权限。这样做可以在 10 秒内完成：

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

对于其他安装选项，请参阅http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

了解更多

See more examples: http://www.gnu.org/software/parallel/man.html

查看更多示例：http: //www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

观看介绍视频：https: //www.youtube.com/playlist?list =PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

演练教程：http: //www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

注册电子邮件列表以获得支持：https: //lists.gnu.org/mailman/listinfo/parallel

Answer 7

回答by camh

Since you specifically asked about this in terms of "map", I thought I'd share this function I have in my personal shell library:

由于您在“地图”方面特别询问了这一点，我想我会在我的个人 shell 库中分享这个功能：

# map_lines: evaluate a command for each line of input
map_lines()
{
        while read line ; do
                 $line
        done
}

I use this in the manner that you for a solution:

我以您作为解决方案的方式使用它：

$ ls | map_lines ./data

I named it map_lines instead of map as I assumed some day I may implement a map_args where you would use it like this:

我将它命名为 map_lines 而不是 map ，因为我假设有一天我可能会实现一个 map_args ，您可以像这样使用它：

$ map_args ./data *

That function would look like this:

该函数如下所示：

map_args()
{
    cmd="" ; shift
    for arg ; do
        $cmd "$arg"
    done
}

Answer 8

回答by Juha Syrj?l?

Try this:

尝试这个：

for i in *; do echo ${i}: `data $i`; done

Answer 9

回答by Banjer

You can create a shell script like so:

您可以像这样创建一个 shell 脚本：

#!/bin/bash
cd /path/to/your/dir
for file in `dir -d *` ; do
  ./data "$file"
done

That loops through every file in /path/to/your/dir and runs your "data" script on it. Be sure to chmod the above script so that it is executable.

这会遍历 /path/to/your/dir 中的每个文件并在其上运行您的“数据”脚本。一定要chmod 上面的脚本，以便它是可执行的。

Answer 10

回答by raspi

You could also use PRLL.

您也可以使用PRLL。

bash“映射”等效：在每个文件上运行命令

提问by Claudiu

回答by Daniel Haley

回答by Mark Byers

回答by Stephen

回答by Paused until further notice.

回答by Cascabel

回答by Ole Tange

回答by camh

回答by Juha Syrj?l?

回答by Banjer

回答by raspi

相关推荐

最近更新

标签

bash“映射”等效：在每个文件上运行命令

提问by Claudiu

回答by Daniel Haley

回答by Mark Byers

回答by Stephen

回答by Paused until further notice.

回答by Cascabel

回答by Ole Tange

回答by camh

回答by Juha Syrj?l?

回答by Banjer

回答by raspi

相关推荐

获取 bash 历史记录到 vi

bash 如何加速 Cygwin？

bash 将 STDERR 发送到记录器

bash 如何从脚本发送信号 SIGINT 到脚本？

相关推荐

最近更新

标签