Linux 在 Unix 命令行中从文件中读取随机行的简单方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/448005/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 16:52:46  来源:igfitidea点击:

What's an easy way to read random line from a file in Unix command line?

linuxunixrandomcommand-line

提问by

What's an easy way to read random line from a file in Unix command line?

在 Unix 命令行中从文件中读取随机行的简单方法是什么?

回答by Tracker1

perlfaq5: How do I select a random line from a file?Here's a reservtheitroad-sampling algorithm from the Camel Book:

perlfaq5:如何从文件中随机选择一行?这是 Camel Book 中的水库采样算法:

perl -e 'srand; rand($.) < 1 && ($line = $_) while <>; print $line;' file

This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.

与读取整个文件相比,这在空间上具有显着优势。您可以在 Donald E. Knuth 的 The Art of Computer Programming,第 2 卷,第 3.4.2 节中找到此方法的证明。

回答by Adam Rosenfield

Here's a simple Python script that will do the job:

这是一个简单的 Python 脚本,可以完成这项工作:

import random, sys
lines = open(sys.argv[1]).readlines()
print(lines[random.randrange(len(lines))])

Usage:

用法:

python randline.py file_to_get_random_line_from

回答by Paolo Tedesco

using a bash script:

使用 bash 脚本:

#!/bin/bash
# replace with file to read
FILE=tmp.txt
# count number of lines
NUM=$(wc - l < ${FILE})
# generate random number in range 0-NUM
let X=${RANDOM} % ${NUM} + 1
# extract X-th line
sed -n ${X}p ${FILE}

回答by asalamon74

Single bash line:

单bash线:

sed -n $((1+$RANDOM%`wc -l test.txt | cut -f 1 -d ' '`))p test.txt

Slight problem: duplicate filename.

小问题:文件名重复。

回答by asalamon74

You can use shuf:

您可以使用shuf

shuf -n 1 $FILE

There is also a utility called rl. In Debian it's in the randomize-linespackage that does exactly what you want, though not available in all distros. On its home page it actually recommends the use of shufinstead (which didn't exist when it was created, I believe). shufis part of the GNU coreutils, rlis not.

还有一个名为rl. 在 Debian 中,它位于randomize-lines完全符合您要求的软件包中,尽管并非在所有发行版中都可用。在它的主页上,它实际上建议使用shuf替代(我相信它在创建时并不存在)。 shuf是 GNU coreutils 的一部分,rl不是。

rl -c 1 $FILE

回答by PolyThinker

Another alternative:

另一种选择:

head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1

回答by Thomas Vander Stichele

sort --random-sort $FILE | head -n 1

(I like the shuf approach above even better though - I didn't even know that existed and I would have never found that tool on my own)

(虽然我更喜欢上面的 shuf 方法 - 我什至不知道存在这种方法,而且我自己也不会找到该工具)

回答by Baskar

Another way using 'awk'

使用 ' awk' 的另一种方式

awk NR==$((${RANDOM} % `wc -l < file.name` + 1)) file.name

回答by jrjc

A solution that also works on MacOSX, and should also works on Linux(?):

一个也适用于 MacOSX 的解决方案,也应该适用于 Linux(?):

N=5
awk 'NR==FNR {lineN[]; next}(FNR in lineN)' <(jot -r $N 1 $(wc -l < $file)) $file 

Where:

在哪里:

  • Nis the number of random lines you want

  • NR==FNR {lineN[$1]; next}(FNR in lineN) file1 file2--> save line numbers written in file1and then print corresponding line in file2

  • jot -r $N 1 $(wc -l < $file)--> draw Nnumbers randomly (-r) in range (1, number_of_line_in_file)with jot. The process substitution <()will make it look like a file for the interpreter, so file1in previous example.
  • N是你想要的随机行数

  • NR==FNR {lineN[$1]; next}(FNR in lineN) file1 file2--> 保存写入的行号file1,然后打印相应的行file2

  • jot -r $N 1 $(wc -l < $file)- >绘制N随机(数字-r范围内)(1, number_of_line_in_file)jot。进程替换<()将使它看起来像解释器的文件,所以file1在前面的例子中。

回答by peak

Using only vanilla sed and awk, and without using $RANDOM, a simple, space-efficient and reasonably fast "one-liner" for selecting a single line pseudo-randomly from a file named FILENAME is as follows:

仅使用 vanilla sed 和 awk,并且不使用 $RANDOM,一个简单、节省空间且相当快速的“单行”用于从名为 FILENAME 的文件中伪随机地选择一行,如下所示:

sed -n $(awk 'END {srand(); r=rand()*NR; if (r<NR) {sub(/\..*/,"",r); r++;}; print r}' FILENAME)p FILENAME

(This works even if FILENAME is empty, in which case no line is emitted.)

(即使 FILENAME 为空,这也有效,在这种情况下,不会发出任何行。)

One possible advantage of this approach is that it only calls rand() once.

这种方法的一个可能优点是它只调用 rand() 一次。

As pointed out by @AdamKatz in the comments, another possibility would be to call rand() for each line:

正如@AdamKatz 在评论中指出的那样,另一种可能性是为每一行调用 rand():

awk 'rand() * NR < 1 { line = ##代码## } END { print line }' FILENAME

(A simple proof of correctness can be given based on induction.)

(可以基于归纳给出一个简单的正确性证明。)

Caveat about rand()

注意事项 rand()

"In most awk implementations, including gawk, rand() starts generating numbers from the same starting number, or seed, each time you run awk."

“在大多数 awk 实现中,包括 gawk,每次运行 awk 时,rand() 都会从相同的起始数字或种子开始生成数字。”

-- https://www.gnu.org/software/gawk/manual/html_node/Numeric-Functions.html

-- https://www.gnu.org/software/gawk/manual/html_node/Numeric-Functions.html