Linux 在 Unix 命令行中从文件中读取随机行的简单方法是什么？

Question

提问by

What's an easy way to read random line from a file in Unix command line?

在 Unix 命令行中从文件中读取随机行的简单方法是什么？

Answer 1

回答by Tracker1

perlfaq5: How do I select a random line from a file?Here's a reservtheitroad-sampling algorithm from the Camel Book:

perlfaq5：如何从文件中随机选择一行？这是 Camel Book 中的水库采样算法：

perl -e 'srand; rand($.) < 1 && ($line = $_) while <>; print $line;' file

This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.

与读取整个文件相比，这在空间上具有显着优势。您可以在 Donald E. Knuth 的 The Art of Computer Programming，第 2 卷，第 3.4.2 节中找到此方法的证明。

Answer 2

回答by Adam Rosenfield

Here's a simple Python script that will do the job:

这是一个简单的 Python 脚本，可以完成这项工作：

import random, sys
lines = open(sys.argv[1]).readlines()
print(lines[random.randrange(len(lines))])

Usage:

用法：

python randline.py file_to_get_random_line_from

Answer 3

回答by Paolo Tedesco

using a bash script:

使用 bash 脚本：

#!/bin/bash
# replace with file to read
FILE=tmp.txt
# count number of lines
NUM=$(wc - l < ${FILE})
# generate random number in range 0-NUM
let X=${RANDOM} % ${NUM} + 1
# extract X-th line
sed -n ${X}p ${FILE}

Answer 4

回答by asalamon74

Single bash line:

单bash线：

sed -n $((1+$RANDOM%`wc -l test.txt | cut -f 1 -d ' '`))p test.txt

Slight problem: duplicate filename.

小问题：文件名重复。

Answer 5

回答by asalamon74

You can use shuf:

您可以使用shuf：

shuf -n 1 $FILE

There is also a utility called rl. In Debian it's in the randomize-linespackage that does exactly what you want, though not available in all distros. On its home page it actually recommends the use of shufinstead (which didn't exist when it was created, I believe). shufis part of the GNU coreutils, rlis not.

还有一个名为rl. 在 Debian 中，它位于randomize-lines完全符合您要求的软件包中，尽管并非在所有发行版中都可用。在它的主页上，它实际上建议使用shuf替代（我相信它在创建时并不存在）。 shuf是 GNU coreutils 的一部分，rl不是。

rl -c 1 $FILE

Answer 6

回答by PolyThinker

Another alternative:

另一种选择：

head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1

Answer 7

回答by Thomas Vander Stichele

sort --random-sort $FILE | head -n 1

(I like the shuf approach above even better though - I didn't even know that existed and I would have never found that tool on my own)

（虽然我更喜欢上面的 shuf 方法 - 我什至不知道存在这种方法，而且我自己也不会找到该工具）

Answer 8

回答by Baskar

Another way using 'awk'

使用 ' awk' 的另一种方式

awk NR==$((${RANDOM} % `wc -l < file.name` + 1)) file.name

Answer 9

回答by jrjc

A solution that also works on MacOSX, and should also works on Linux(?):

一个也适用于 MacOSX 的解决方案，也应该适用于 Linux(?)：

N=5
awk 'NR==FNR {lineN[]; next}(FNR in lineN)' <(jot -r $N 1 $(wc -l < $file)) $file

Where:

在哪里：

Nis the number of random lines you want
NR==FNR {lineN[$1]; next}(FNR in lineN) file1 file2--> save line numbers written in file1and then print corresponding line in file2
jot -r $N 1 $(wc -l < $file)--> draw Nnumbers randomly (-r) in range (1, number_of_line_in_file)with jot. The process substitution <()will make it look like a file for the interpreter, so file1in previous example.

N是你想要的随机行数
NR==FNR {lineN[$1]; next}(FNR in lineN) file1 file2--> 保存写入的行号file1，然后打印相应的行file2
jot -r $N 1 $(wc -l < $file)- >绘制N随机（数字-r范围内）(1, number_of_line_in_file)与jot。进程替换<()将使它看起来像解释器的文件，所以file1在前面的例子中。

Answer 10

回答by peak

Using only vanilla sed and awk, and without using $RANDOM, a simple, space-efficient and reasonably fast "one-liner" for selecting a single line pseudo-randomly from a file named FILENAME is as follows:

仅使用 vanilla sed 和 awk，并且不使用 $RANDOM，一个简单、节省空间且相当快速的“单行”用于从名为 FILENAME 的文件中伪随机地选择一行，如下所示：

sed -n $(awk 'END {srand(); r=rand()*NR; if (r<NR) {sub(/\..*/,"",r); r++;}; print r}' FILENAME)p FILENAME

(This works even if FILENAME is empty, in which case no line is emitted.)

（即使 FILENAME 为空，这也有效，在这种情况下，不会发出任何行。）

One possible advantage of this approach is that it only calls rand() once.

这种方法的一个可能优点是它只调用 rand() 一次。

As pointed out by @AdamKatz in the comments, another possibility would be to call rand() for each line:

正如@AdamKatz 在评论中指出的那样，另一种可能性是为每一行调用 rand()：

awk 'rand() * NR < 1 { line = ##代码## } END { print line }' FILENAME

(A simple proof of correctness can be given based on induction.)

（可以基于归纳给出一个简单的正确性证明。）

Caveat about `rand()`

注意事项 `rand()`

"In most awk implementations, including gawk, rand() starts generating numbers from the same starting number, or seed, each time you run awk."

“在大多数 awk 实现中，包括 gawk，每次运行 awk 时，rand() 都会从相同的起始数字或种子开始生成数字。”

-- https://www.gnu.org/software/gawk/manual/html_node/Numeric-Functions.html

Linux 在 Unix 命令行中从文件中读取随机行的简单方法是什么？

提问by

回答by Tracker1

回答by Adam Rosenfield

回答by Paolo Tedesco

回答by asalamon74

回答by asalamon74

回答by PolyThinker

回答by Thomas Vander Stichele

回答by Baskar

回答by jrjc

回答by peak

Caveat about `rand()`

注意事项 `rand()`

相关推荐

最近更新

标签

Linux 在 Unix 命令行中从文件中读取随机行的简单方法是什么？

提问by

回答by Tracker1

回答by Adam Rosenfield

回答by Paolo Tedesco

回答by asalamon74

回答by asalamon74

回答by PolyThinker

回答by Thomas Vander Stichele

回答by Baskar

回答by jrjc

回答by peak

Caveat about rand()

注意事项 rand()

相关推荐

Linux 修复 fstab（只读/）

C# 使用 ConfigurationManager.RefreshSection 在不重新启动应用程序的情况下重新加载配置

Linux 在 shell 中获取程序执行时间

Linux 向进程组的所有成员发送信号的最佳方式是什么？

相关推荐

最近更新

标签

Caveat about `rand()`

注意事项 `rand()`