bash 如何从bash中的目录中选择随机文件？

Question

提问by Marlo Guthrie

I have a directory with about 2000 files. How can I select a random sample of Nfiles through using either a bash script or a list of piped commands?

我有一个包含大约 2000 个文件的目录。如何N通过使用 bash 脚本或管道命令列表来选择文件的随机样本？

Answer 1

回答by Josh Lee

Here's a script that uses GNU sort's random option:

这是一个使用 GNU sort 的随机选项的脚本：

ls |sort -R |tail -$N |while read file; do
    # Something involving $file, or you can leave
    # off the while to just get the filenames
done

Answer 2

回答by Nordic Mainframe

You can use shuf(from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:

您可以使用shuf（来自 GNU coreutils 包）。只需提供一个文件名列表，并要求它从随机排列中返回第一行：

ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..

Adjust the -n, --head-count=COUNTvalue to return the number of wanted lines. For example to return 5 random filenames you would use:

调整-n, --head-count=COUNT值以返回所需的行数。例如，要返回 5 个随机文件名，您将使用：

find dirname -type f | shuf -n 5

Answer 3

回答by gniourf_gniourf

Here are a few possibilities that don't parse the output of lsand that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randfwith a list of random files. This array is easily printed with printf '%s\n' "${randf[@]}"if needed.

这里有一些不解析输出的可能性，ls并且对于名称中带有空格和有趣符号的文件是 100% 安全的。所有这些都将randf使用随机文件列表填充数组。printf '%s\n' "${randf[@]}"如果需要，这个数组很容易打印。

This one will possibly output the same file several times, and Nneeds to be known in advance. Here I chose N=42.
```
a=( * )
randf=( "${a[RANDOM%${#a[@]}]"{1..42}"}" )
```
This feature is not very well documented.
If N is not known in advance, but you really liked the previous possibility, you can use eval. But it's evil, and you must really make sure that Ndoesn't come directly from user input without being thoroughly checked!
```
N=42
a=( * )
eval randf=( \"${a[RANDOM%${#a[@]}]\"\{1..$N\}\"}\" )
```
I personally dislike evaland hence this answer!

The same using a more straightforward method (a loop):

N=42
a=( * )
randf=()
for((i=0;i<N;++i)); do
    randf+=( "${a[RANDOM%${#a[@]}]}" )
done

If you don't want to possibly have several times the same file:

N=42
a=( * )
randf=()
for((i=0;i<N && ${#a[@]};++i)); do
    ((j=RANDOM%${#a[@]}))
    randf+=( "${a[j]}" )
    a=( "${a[@]:0:j}" "${a[@]:j+1}" )
done

这个可能会多次输出同一个文件，N需要提前知道。这里我选择了 N=42。
```
a=( * )
randf=( "${a[RANDOM%${#a[@]}]"{1..42}"}" )
```
此功能没有很好的文档记录。
如果事先不知道 N，但您确实喜欢以前的可能性，则可以使用eval. 但它是邪恶的，你必须真正确保N它不是直接来自用户输入而没有经过彻底检查！
```
N=42
a=( * )
eval randf=( \"${a[RANDOM%${#a[@]}]\"\{1..$N\}\"}\" )
```
我个人不喜欢eval，因此这个答案！

同样使用更直接的方法（循环）：

N=42
a=( * )
randf=()
for((i=0;i<N;++i)); do
    randf+=( "${a[RANDOM%${#a[@]}]}" )
done

如果您不想多次使用同一个文件：

N=42
a=( * )
randf=()
for((i=0;i<N && ${#a[@]};++i)); do
    ((j=RANDOM%${#a[@]}))
    randf+=( "${a[j]}" )
    a=( "${a[@]:0:j}" "${a[@]:j+1}" )
done

Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible bashpractice, and the other answer is not much better as it also parses the output of ls. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.

注意。这是对旧帖子的迟到答案，但接受的答案链接到显示糟糕的bash实践的外部页面，而另一个答案也好不到哪里去，因为它也解析了ls. 对已接受答案的评论指出 Lhunath 的一个很好的答案，这显然表明了良好的做法，但并没有完全回答 OP。

Answer 4

回答by silgon

ls | shuf -n 10 # ten random files

Answer 5

回答by scai

A simple solution for selecting 5random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:

选择5随机文件同时避免解析 ls 的简单解决方案。它还适用于包含空格、换行符和其他特殊字符的文件：

shuf -ezn 5 * | xargs -0 -n1 echo

Replace echowith the command you want to execute for your files.

替换echo为您要为文件执行的命令。

Answer 6

回答by Mark

If you have Python installed (works with either Python 2 or Python 3):

如果您安装了 Python（适用于 Python 2 或 Python 3）：

To select one file (or line from an arbitrary command), use

要选择一个文件（或来自任意命令的行），请使用

ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"

To select Nfiles/lines, use (note Nis at the end of the command, replace this by a number)

要选择N文件/行，请使用（注意N在命令末尾，用数字替换）

ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N

Answer 7

回答by Ken

This is an even later response to @gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding evaland once for safe filename handling.)

这是对@gniourf_gniourf 迟到的答案的更晚的回应，我只是赞成，因为它是迄今为止最好的答案，两次。（一次是为了避免eval，一次是为了安全的文件名处理。）

But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.

但是我花了几分钟来解开这个答案使用的“没有很好记录的”功能。如果您的 Bash 技能足够扎实，可以立即看到它是如何工作的，请跳过此评论。但我没有，并且解开它我认为值得解释。

Feature #1is the shell's own file globbing. a=(*)creates an array, $a, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls.

功能#1是shell 自己的文件globbing。a=(*)创建一个数组，$a，其成员是当前目录中的文件。Bash 理解文件名的所有奇怪之处，因此可以保证列表正确，保证已转义等。无需担心正确解析ls.返回的文本文件名。

Feature #2is Bash parameter expansionsfor arrays, one nested within another. This starts with ${#ARRAY[@]}, which expands to the length of $ARRAY.

功能＃2是击参数扩展为阵列，一个嵌套在另一个。这以开头${#ARRAY[@]}，它扩展到的长度$ARRAY。

That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:

然后使用该扩展为数组添加下标。在 1 和 N 之间找到随机数的标准方法是取随机数的值对 N 取模。我们想要一个介于 0 和数组长度之间的随机数。这是方法，为清楚起见分为两行：

LENGTH=${#ARRAY[@]}
RANDOM=${a[RANDOM%$LENGTH]}

But this solution does it in a single line, removing the unnecessary variable assignment.

但是这个解决方案在一行中完成，删除了不必要的变量赋值。

Feature #3is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt, filename2.txt, etc: echo "filename"{1..25}".txt".

功能 #3是Bash 大括号扩展，尽管我必须承认我并不完全理解它。括号扩展使用，例如，产生的25个文件命名列表filename1.txt，filename2.txt等等：echo "filename"{1..25}".txt"。

The expression inside the subshell above, "${a[RANDOM%${#a[@]}]"{1..42}"}", uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ]and the }, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)

上面子shell 中的表达式"${a[RANDOM%${#a[@]}]"{1..42}"}"，使用这个技巧产生了42 个单独的扩展。大括号扩展在 the]和 the之间放置一个数字}，起初我认为它是数组的下标，但如果是这样，它前面会有一个冒号。（它也会从数组中的一个随机位置返回 42 个连续的项目，这与从数组中返回 42 个随机项目完全不同。）我认为这只是让 shell 运行扩展 42 次，从而返回数组中的 42 个随机项。（但如果有人能更全面地解释它，我很想听听。）

The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.

N 必须硬编码（到 42）的原因是大括号扩展发生在变量扩展之前。

Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:

最后，这是Feature #4，如果您想对目录层次结构递归执行此操作：

shopt -s globstar
a=( ** )

This turns on a shell optionthat causes **to match recursively. Now your $aarray contains every file in the entire hierarchy.

这将打开导致递归匹配的shell 选项**。现在您的$a数组包含整个层次结构中的每个文件。

Answer 8

回答by Bhaskar Chakradhar

If you have more files in your folder, you can use the below piped command I found in unix stackexchange.

如果您的文件夹中有更多文件，您可以使用我在 unix stackexchange 中找到的以下管道命令。

find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/

Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp.

在这里我想复制文件，但如果你想移动文件或做其他事情，只需更改我使用的最后一个命令cp。

Answer 9

回答by cat

MacOS does not have the sort -Rand shufcommands, so I needed a bash only solution that randomizes all files without duplicatesand did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.

MacOS 没有sort -R和shuf命令，因此我需要一个仅 bash 的解决方案，该解决方案可以随机化所有文件而没有重复项，并且在此处没有找到。此解决方案类似于 gniourf_gniourf 的解决方案 #4，但希望添加更好的注释。

The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.

该脚本应该很容易修改以在 N 个样本后停止使用 if 计数器，或 gniourf_gniourf 的 for 循环和 N。 $RANDOM 仅限于 ~32000 个文件，但在大多数情况下应该这样做。

#!/bin/bash

array=(*)  # this is the array of files to shuffle
# echo ${array[@]}
for dummy in "${array[@]}"; do  # do loop length(array) times; once for each file
    length=${#array[@]}
    randomi=$(( $RANDOM % $length ))  # select a random index

    filename=${array[$randomi]}
    echo "Processing: '$filename'"  # do something with the file

    unset -v "array[$randomi]"  # set the element at index $randomi to NULL
    array=("${array[@]}")  # remove NULL elements introduced by unset; copy array
done

Answer 10

回答by benmarbles

This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:

这是我可以在 MacOS 上很好地使用 bash 的唯一脚本。我结合并编辑了以下两个链接中的片段：

ls command: how can I get a recursive full-path listing, one line per file?

ls 命令：如何获得递归完整路径列表，每个文件一行？

http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/

#!/bin/bash

# Reads a given directory and picks a random file.

# The directory you want to use. You could use "" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR=""

# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'

if [[ -d "${DIR}" ]]
then
  # Runs ls on the given dir, and dumps the output into a matrix,
  # it uses the new lines character as a field delimiter, as explained above.
  #  file_matrix=($(ls -LR "${DIR}"))

  file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=##代码##;f=0}; /:$/&&!f{sub(/:$/,"");s=##代码##;f=1;next}; NF&&f{ print s"/"##代码## }'))
  num_files=${#file_matrix[*]}

  # This is the command you want to run on a random file.
  # Change "ls -l" by anything you want, it's just an example.
  ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi

exit 0

bash 如何从bash中的目录中选择随机文件？

提问by Marlo Guthrie

回答by Josh Lee

回答by Nordic Mainframe

回答by gniourf_gniourf

回答by silgon

回答by scai

回答by Mark

回答by Ken

回答by Bhaskar Chakradhar

回答by cat

回答by benmarbles

相关推荐

最近更新

标签

bash 如何从bash中的目录中选择随机文件？

提问by Marlo Guthrie

回答by Josh Lee

回答by Nordic Mainframe

回答by gniourf_gniourf

回答by silgon

回答by scai

回答by Mark

回答by Ken

回答by Bhaskar Chakradhar

回答by cat

回答by benmarbles

相关推荐

bash 在 unix/linux shell 中进行模式匹配时，如何使用反向通配符或负通配符？

bash 如何向shell脚本添加进度条？

bash 如何通过 SSH 自动将命令远程并行运行到多个服务器？

bash 如何删除/移除 shell 函数？

相关推荐

最近更新

标签