bash 从 shell 脚本的目录中选择随机文件的最佳方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/701505/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best way to choose a random file from a directory in a shell script
提问by JasonSmith
What is the best way to choose a random file from a directory in a shell script?
从shell脚本的目录中选择随机文件的最佳方法是什么?
Here is my solution in Bash but I would be very interested for a more portable (non-GNU) version for use on Unix proper.
这是我在 Bash 中的解决方案,但我对在 Unix 上使用的更便携(非 GNU)版本非常感兴趣。
dir='some/directory'
file=`/bin/ls -1 "$dir" | sort --random-sort | head -1`
path=`readlink --canonicalize "$dir/$file"` # Converts to full path
echo "The randomly-selected file is: $path"
Anybody have any other ideas?
有人有其他想法吗?
Edit:lhunath makes a good point about parsing ls
. I guess it comes down to whether you want to be portable or not. If you have the GNU findutils and coreutils then you can do:
编辑:lhunath 对解析ls
. 我想这归结为您是否想要便携。如果您有 GNU findutils 和 coreutils,那么您可以执行以下操作:
find "$dir" -maxdepth 1 -mindepth 1 -type f -print0 \
| sort --zero-terminated --random-sort \
| sed 's/\d000.*//g/'
Whew, that was fun! Also it matches my question better since I said "random file". Honsetly though, these days it's hard to imagine a Unix system deployed out there having GNU installed but not Perl 5.
哇,那很有趣!由于我说的是“随机文件”,它也更符合我的问题。老实说,现在很难想象在那里部署的 Unix 系统安装了 GNU 但没有安装 Perl 5。
回答by lhunath
files=(/my/dir/*)
printf "%s\n" "${files[RANDOM % ${#files[@]}]}"
And don't parse ls. Read http://mywiki.wooledge.org/ParsingLs
并且不要解析 ls。读http://mywiki.wooledge.org/ParsingLs
Edit: Good luck finding a non-bash
solution that's reliable. Most will break for certain types of filenames, such as filenames with spaces or newlines or dashes (it's pretty much impossible in pure sh
). To do it right without bash
, you'd need to fully migrate to awk
/perl
/python
/... without piping that output for further processing or such.
编辑:祝你找到一个bash
可靠的非解决方案。大多数文件名会因某些类型的文件名而中断,例如带有空格、换行符或破折号的文件名(在 pure 中几乎不可能sh
)。要做到这一点不正确bash
,你需要完全迁移到awk
/ perl
/ python
/ ...不包括管道是输出作进一步处理或等。
回答by johnnyB
Is "shuf" not portable?
“shuf”不便携吗?
shuf -n1 -e /path/to/files/*
or find if files are deeper than one directory:
或者查找文件是否比一个目录更深:
find /path/to/files/ -type f | shuf -n1
it's part of coreutils but you'll need 6.4 or newer to get it... so RH/CentOS does not include it.
它是 coreutils 的一部分,但您需要 6.4 或更高版本才能获得它……所以 RH/CentOS 不包含它。
回答by fido
Something like:
就像是:
let x="$RANDOM % ${#file}"
echo "The randomly-selected file is ${path[$x]}"
$RANDOM
in bash is a special variable that returns a random number, then I use modulus division to get a valid index, then reference that index in the array.
$RANDOM
在 bash 中是一个特殊变量,它返回一个随机数,然后我使用模数除法来获取有效索引,然后在数组中引用该索引。
回答by Pipo
# ******************************************************************
# ******************************************************************
function randomFile {
tmpFile=$(mktemp)
files=$(find . -type f > $tmpFile)
total=$(cat "$tmpFile"|wc -l)
randomNumber=$(($RANDOM%$total))
i=0
while read line; do
if [ "$i" -eq "$randomNumber" ];then
# Do stuff with file
amarok $line
break
fi
i=$[$i+1]
done < $tmpFile
rm $tmpFile
}
回答by Aaron Digulla
This boils down to: How can I create a random number in a Unix script in a portable way?
这归结为:如何以可移植的方式在 Unix 脚本中创建随机数?
Because if you have a random number between 1 and N, you can use head -$N | tail
to cut somewhere in the middle. Unfortunately, I know no portable way to do this with the shell alone. If you have Python or Perl, you can easily use their random support but AFAIK, there is no standard rand(1)
command.
因为如果您有一个介于 1 和 N 之间的随机数,则可以使用head -$N | tail
在中间的某个位置进行切割。不幸的是,我不知道单独使用 shell 没有便携式方法可以做到这一点。如果你有 Python 或 Perl,你可以很容易地使用它们的随机支持,但 AFAIK,没有标准rand(1)
命令。
回答by ashawley
I think Awk is a good tool to get a random number. According to the Advanced Bash Guide, Awk is a good random number replacement for $RANDOM
.
我认为 awk 是获取随机数的好工具。根据高级 Bash 指南,Awk 是一个很好的随机数替代品$RANDOM
。
Here's a version of your script that avoids Bash-isms and GNU tools.
这是您的脚本版本,它避免了 Bash 主义和 GNU 工具。
#! /bin/sh
dir='some/directory'
n_files=`/bin/ls -1 "$dir" | wc -l | cut -f1`
rand_num=`awk "BEGIN{srand();print int($n_files * rand()) + 1;}"`
file=`/bin/ls -1 "$dir" | sed -ne "${rand_num}p"`
path=`cd $dir && echo "$PWD/$file"` # Converts to full path.
echo "The randomly-selected file is: $path"
It inherits the problems other answers have mentioned should files contain newlines.
如果文件包含换行符,它继承了其他答案提到的问题。
回答by gsbabil
Newlines in file-names can be avoided by doing the following in Bash:
通过在 Bash 中执行以下操作,可以避免文件名中的换行符:
#!/bin/sh
OLDIFS=$IFS
IFS=$(echo -en "\n\b")
DIR="/home/user"
for file in $(ls -1 $DIR)
do
echo $file
done
IFS=$OLDIFS
回答by Gilles 'SO- stop being evil'
Here's a shell snippet that relies only on POSIX features and copes with arbitrary file names (but omits dot files from the selection). The random selection uses awk, because that's all you get in POSIX. It's a very poor random number generator, since awk's RNG is seeded with the current time in seconds (so it's easily predictable, and returns the same choice if you call it multiple times per second).
这是一个仅依赖于 POSIX 功能并处理任意文件名的 shell 片段(但从选择中省略了点文件)。随机选择使用 awk,因为这就是 POSIX 中的全部内容。这是一个非常糟糕的随机数生成器,因为 awk 的 RNG 以当前时间为单位(以秒为单位)(因此它很容易预测,并且如果您每秒多次调用它会返回相同的选择)。
set -- *
n=$(echo $# | awk '{srand(); print int(rand()*set -- *; [ -e "" ] || shift
set .[!.]* "$@"; [ -e "" ] || shift
set ..?* "$@"; [ -e "" ] || shift
if [ $# -eq 0]; then echo 1>&2 "empty directory"; exit 1; fi
) + 1}')
eval "file=$$n"
echo "Processing $file"
If you don't want to ignore dot files, the file name generation code (set -- *
) needs to be replaced by something more complicated.
如果您不想忽略点文件,则set -- *
需要将文件名生成代码 ( ) 替换为更复杂的内容。
while
n=$(($(openssl rand 3 | od -An -t u4) + 1))
[ $n -gt $((16777216 / $# * $#)) ]
do :; done
n=$((n % $#))
If you have OpenSSL available, you can use it to generate random bytes. If you don't but your system has /dev/urandom
, replace the call to openssl
by dd if=/dev/urandom bs=3 count=1 2>/dev/null
. Here's a snippet that sets n
to a random value between 1 and $#
, taking care not to introduce a bias. This snippet assumes that $#
is at most 2^23-1.
如果您有可用的 OpenSSL,则可以使用它来生成随机字节。如果您没有,但您的系统有/dev/urandom
,请替换对openssl
by的调用dd if=/dev/urandom bs=3 count=1 2>/dev/null
。这是一个设置n
为 1 和 之间的随机值的片段,$#
注意不要引入偏差。此代码段假设$#
最多为 2^23-1。
#!/bin/sh
FILES="/usr/bin/*"
for f in $FILES; do echo "$RANDOM $f" ; done | sort -n | head -n1 | cut -d' ' -f2-
回答by Robert Calhoun
BusyBox (used on embedded devices) is usually configured to support $RANDOM
but it doesn't have bash-style arrays or sort --random-sort
or shuf
. Hence the following:
BusyBox(用于嵌入式设备)通常配置为支持$RANDOM
但它没有 bash 样式的数组sort --random-sort
或shuf
. 因此有以下几点:
ls | awk '{ line[NR]=##代码## } END { print line[(int(rand()*NR+1))]}'
Note trailing "-" in cut -f2-
; this is required to avoid truncating files that contain spaces (or whatever separator you want to use).
注意在cut -f2-
; 中尾随“-” 这是避免截断包含空格(或您想要使用的任何分隔符)的文件所必需的。
It won't handle filenames with embedded newlines correctly.
它不会正确处理带有嵌入换行符的文件名。
回答by kapu
Put each line of output from the command 'ls' into an associative array named line and then choose one of those like so...
将命令 'ls' 的每一行输出放入一个名为 line 的关联数组中,然后选择其中之一...
##代码##