bash 如何从bash中的目录中选择随机文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/414164/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I select random files from a directory in bash?
提问by Marlo Guthrie
I have a directory with about 2000 files. How can I select a random sample of N
files through using either a bash script or a list of piped commands?
我有一个包含大约 2000 个文件的目录。如何N
通过使用 bash 脚本或管道命令列表来选择文件的随机样本?
回答by Josh Lee
Here's a script that uses GNU sort's random option:
这是一个使用 GNU sort 的随机选项的脚本:
ls |sort -R |tail -$N |while read file; do
# Something involving $file, or you can leave
# off the while to just get the filenames
done
回答by Nordic Mainframe
You can use shuf
(from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:
您可以使用shuf
(来自 GNU coreutils 包)。只需提供一个文件名列表,并要求它从随机排列中返回第一行:
ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..
Adjust the -n, --head-count=COUNT
value to return the number of wanted lines. For example to return 5 random filenames you would use:
调整-n, --head-count=COUNT
值以返回所需的行数。例如,要返回 5 个随机文件名,您将使用:
find dirname -type f | shuf -n 5
回答by gniourf_gniourf
Here are a few possibilities that don't parse the output of ls
and that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randf
with a list of random files. This array is easily printed with printf '%s\n' "${randf[@]}"
if needed.
这里有一些不解析输出的可能性,ls
并且对于名称中带有空格和有趣符号的文件是 100% 安全的。所有这些都将randf
使用随机文件列表填充数组。printf '%s\n' "${randf[@]}"
如果需要,这个数组很容易打印。
This one will possibly output the same file several times, and
N
needs to be known in advance. Here I chose N=42.a=( * ) randf=( "${a[RANDOM%${#a[@]}]"{1..42}"}" )
This feature is not very well documented.
If N is not known in advance, but you really liked the previous possibility, you can use
eval
. But it's evil, and you must really make sure thatN
doesn't come directly from user input without being thoroughly checked!N=42 a=( * ) eval randf=( \"${a[RANDOM%${#a[@]}]\"\{1..$N\}\"}\" )
I personally dislike
eval
and hence this answer!The same using a more straightforward method (a loop):
N=42 a=( * ) randf=() for((i=0;i<N;++i)); do randf+=( "${a[RANDOM%${#a[@]}]}" ) done
If you don't want to possibly have several times the same file:
N=42 a=( * ) randf=() for((i=0;i<N && ${#a[@]};++i)); do ((j=RANDOM%${#a[@]})) randf+=( "${a[j]}" ) a=( "${a[@]:0:j}" "${a[@]:j+1}" ) done
这个可能会多次输出同一个文件,
N
需要提前知道。这里我选择了 N=42。a=( * ) randf=( "${a[RANDOM%${#a[@]}]"{1..42}"}" )
此功能没有很好的文档记录。
如果事先不知道 N,但您确实喜欢以前的可能性,则可以使用
eval
. 但它是邪恶的,你必须真正确保N
它不是直接来自用户输入而没有经过彻底检查!N=42 a=( * ) eval randf=( \"${a[RANDOM%${#a[@]}]\"\{1..$N\}\"}\" )
我个人不喜欢
eval
,因此这个答案!同样使用更直接的方法(循环):
N=42 a=( * ) randf=() for((i=0;i<N;++i)); do randf+=( "${a[RANDOM%${#a[@]}]}" ) done
如果您不想多次使用同一个文件:
N=42 a=( * ) randf=() for((i=0;i<N && ${#a[@]};++i)); do ((j=RANDOM%${#a[@]})) randf+=( "${a[j]}" ) a=( "${a[@]:0:j}" "${a[@]:j+1}" ) done
Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible bashpractice, and the other answer is not much better as it also parses the output of ls
. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.
注意。这是对旧帖子的迟到答案,但接受的答案链接到显示糟糕的bash实践的外部页面,而另一个答案也好不到哪里去,因为它也解析了ls
. 对已接受答案的评论指出 Lhunath 的一个很好的答案,这显然表明了良好的做法,但并没有完全回答 OP。
回答by silgon
ls | shuf -n 10 # ten random files
回答by scai
A simple solution for selecting 5
random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:
选择5
随机文件同时避免解析 ls 的简单解决方案。它还适用于包含空格、换行符和其他特殊字符的文件:
shuf -ezn 5 * | xargs -0 -n1 echo
Replace echo
with the command you want to execute for your files.
替换echo
为您要为文件执行的命令。
回答by Mark
If you have Python installed (works with either Python 2 or Python 3):
如果您安装了 Python(适用于 Python 2 或 Python 3):
To select one file (or line from an arbitrary command), use
要选择一个文件(或来自任意命令的行),请使用
ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"
To select N
files/lines, use (note N
is at the end of the command, replace this by a number)
要选择N
文件/行,请使用(注意N
在命令末尾,用数字替换)
ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N
回答by Ken
This is an even later response to @gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding eval
and once for safe filename handling.)
这是对@gniourf_gniourf 迟到的答案的更晚的回应,我只是赞成,因为它是迄今为止最好的答案,两次。(一次是为了避免eval
,一次是为了安全的文件名处理。)
But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.
但是我花了几分钟来解开这个答案使用的“没有很好记录的”功能。如果您的 Bash 技能足够扎实,可以立即看到它是如何工作的,请跳过此评论。但我没有,并且解开它我认为值得解释。
Feature #1is the shell's own file globbing. a=(*)
creates an array, $a
, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls
.
功能#1是shell 自己的文件globbing。a=(*)
创建一个数组,$a
,其成员是当前目录中的文件。Bash 理解文件名的所有奇怪之处,因此可以保证列表正确,保证已转义等。无需担心正确解析ls
.返回的文本文件名。
Feature #2is Bash parameter expansionsfor arrays, one nested within another. This starts with ${#ARRAY[@]}
, which expands to the length of $ARRAY
.
功能#2是击参数扩展为阵列,一个嵌套在另一个。这以 开头${#ARRAY[@]}
,它扩展到 的长度$ARRAY
。
That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:
然后使用该扩展为数组添加下标。在 1 和 N 之间找到随机数的标准方法是取随机数的值对 N 取模。我们想要一个介于 0 和数组长度之间的随机数。这是方法,为清楚起见分为两行:
LENGTH=${#ARRAY[@]}
RANDOM=${a[RANDOM%$LENGTH]}
But this solution does it in a single line, removing the unnecessary variable assignment.
但是这个解决方案在一行中完成,删除了不必要的变量赋值。
Feature #3is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt
, filename2.txt
, etc: echo "filename"{1..25}".txt"
.
功能 #3是Bash 大括号扩展,尽管我必须承认我并不完全理解它。括号扩展使用,例如,产生的25个文件命名列表filename1.txt
,filename2.txt
等等:echo "filename"{1..25}".txt"
。
The expression inside the subshell above, "${a[RANDOM%${#a[@]}]"{1..42}"}"
, uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ]
and the }
, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)
上面子shell 中的表达式"${a[RANDOM%${#a[@]}]"{1..42}"}"
,使用这个技巧产生了42 个单独的扩展。大括号扩展在 the]
和 the之间放置一个数字}
,起初我认为它是数组的下标,但如果是这样,它前面会有一个冒号。(它也会从数组中的一个随机位置返回 42 个连续的项目,这与从数组中返回 42 个随机项目完全不同。)我认为这只是让 shell 运行扩展 42 次,从而返回数组中的 42 个随机项。(但如果有人能更全面地解释它,我很想听听。)
The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.
N 必须硬编码(到 42)的原因是大括号扩展发生在变量扩展之前。
Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:
最后,这是Feature #4,如果您想对目录层次结构递归执行此操作:
shopt -s globstar
a=( ** )
This turns on a shell optionthat causes **
to match recursively. Now your $a
array contains every file in the entire hierarchy.
这将打开导致递归匹配的shell 选项**
。现在您的$a
数组包含整个层次结构中的每个文件。
回答by Bhaskar Chakradhar
If you have more files in your folder, you can use the below piped command I found in unix stackexchange.
如果您的文件夹中有更多文件,您可以使用我在 unix stackexchange 中找到的以下管道命令。
find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/
Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp
.
在这里我想复制文件,但如果你想移动文件或做其他事情,只需更改我使用的最后一个命令cp
。
回答by cat
MacOS does not have the sort -Rand shufcommands, so I needed a bash only solution that randomizes all files without duplicatesand did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.
MacOS 没有sort -R和shuf命令,因此我需要一个仅 bash 的解决方案,该解决方案可以随机化所有文件而没有重复项,并且在此处没有找到。此解决方案类似于 gniourf_gniourf 的解决方案 #4,但希望添加更好的注释。
The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.
该脚本应该很容易修改以在 N 个样本后停止使用 if 计数器,或 gniourf_gniourf 的 for 循环和 N。 $RANDOM 仅限于 ~32000 个文件,但在大多数情况下应该这样做。
#!/bin/bash
array=(*) # this is the array of files to shuffle
# echo ${array[@]}
for dummy in "${array[@]}"; do # do loop length(array) times; once for each file
length=${#array[@]}
randomi=$(( $RANDOM % $length )) # select a random index
filename=${array[$randomi]}
echo "Processing: '$filename'" # do something with the file
unset -v "array[$randomi]" # set the element at index $randomi to NULL
array=("${array[@]}") # remove NULL elements introduced by unset; copy array
done
回答by benmarbles
This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:
这是我可以在 MacOS 上很好地使用 bash 的唯一脚本。我结合并编辑了以下两个链接中的片段:
ls command: how can I get a recursive full-path listing, one line per file?
#!/bin/bash
# Reads a given directory and picks a random file.
# The directory you want to use. You could use "" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR=""
# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'
if [[ -d "${DIR}" ]]
then
# Runs ls on the given dir, and dumps the output into a matrix,
# it uses the new lines character as a field delimiter, as explained above.
# file_matrix=($(ls -LR "${DIR}"))
file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=##代码##;f=0}; /:$/&&!f{sub(/:$/,"");s=##代码##;f=1;next}; NF&&f{ print s"/"##代码## }'))
num_files=${#file_matrix[*]}
# This is the command you want to run on a random file.
# Change "ls -l" by anything you want, it's just an example.
ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi
exit 0