如何使用 Red Hat Linux 上的标准工具随机化文件中的行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/886237/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 17:20:16  来源:igfitidea点击:

How can I randomize the lines in a file using standard tools on Red Hat Linux?

linuxfilerandomredhatshuffle

提问by Stuart Woodward

How can I randomize the lines in a file using standard tools on Red Hat Linux?

如何使用 Red Hat Linux 上的标准工具随机化文件中的行?

I don't have the shufcommand, so I am looking for something like a perlor awkone-liner that accomplishes the same task.

我没有shuf命令,所以我正在寻找类似perl或单线的东西awk来完成相同的任务。

采纳答案by Chris Lutz

And a Perl one-liner you get!

你会得到一个 Perl 单行程序!

perl -MList::Util -e 'print List::Util::shuffle <>'

It uses a module, but the module is part of the Perl code distribution. If that's not good enough, you may consider rolling your own.

它使用一个模块,但该模块是 Perl 代码分发的一部分。如果这还不够好,您可以考虑自己动手。

I tried using this with the -iflag ("edit-in-place") to have it edit the file. The documentation suggests it should work, but it doesn't. It still displays the shuffled file to stdout, but this time it deletes the original. I suggest you don't use it.

我尝试将其与-i标志(“就地编辑”)一起使用以使其编辑文件。文档表明它应该可以工作,但事实并非如此。它仍然将洗牌后的文件显示到标准输出,但这次它删除了原始文件。我建议你不要使用它。

Consider a shell script:

考虑一个shell脚本:

#!/bin/sh

if [[ $# -eq 0 ]]
then
  echo "Usage: 
cat yourfile.txt | while IFS= read -r f; do printf "%05d %s\n" "$RANDOM" "$f"; done | sort -n | cut -c7-
[file ...]" exit 1 fi for i in "$@" do perl -MList::Util -e 'print List::Util::shuffle <>' $i > $i.new if [[ `wc -c $i` -eq `wc -c $i.new` ]] then mv $i.new $i else echo "Error for file $i!" fi done

Untested, but hopefully works.

未经测试,但希望有效。

回答by ChristopheD

sort --random-sort

Read the file, prepend every line with a random number, sort the file on those random prefixes, cut the prefixes afterwards. One-liner which should work in any semi-modern shell.

读取文件,在每一行前面加上一个随机数,根据这些随机前缀对文件进行排序,然后剪切前缀。应该在任何半现代外壳中工作的单衬。

EDIT: incorporated Richard Hansen's remarks.

编辑:纳入理查德汉森的言论。

回答by Jim T

Um, lets not forget

嗯,让我们不要忘记

unsort ()
{
    LC_ALL=C sort -R "$@"
}

回答by ephemient

Related to Jim's answer:

与吉姆的回答相关:

My ~/.bashrccontains the following:

我的~/.bashrc包含以下内容:

perl -MList::Util=shuffle -e'print shuffle<>'

With GNU coreutils's sort, -R= --random-sort, which generates a random hash of each line and sorts by it. The randomized hash wouldn't actually be used in some locales in some older (buggy) versions, causing it to return normal sorted output, which is why I set LC_ALL=C.

使用 GNU coreutils 的排序,-R= --random-sort,它生成每行的随机散列并按它排序。随机散列实际上不会在某些较旧(有问题)版本的某些语言环境中使用,导致它返回正常的排序输出,这就是我设置LC_ALL=C.



Related to Chris's answer:

与克里斯的回答相关:

perl -MList::Util=shuffle -i -ne'BEGIN{undef$/}print shuffle split/^/m'

is a slightly shorter one-liner. (-Mmodule=a,b,cis shorthand for -e 'use module qw(a b c);'.)

是一种略短的单线。(-Mmodule=a,b,c是 的简写-e 'use module qw(a b c);'。)

The reason giving it a simple -idoesn't work for shuffling in-place is because Perl expects that the printhappens in the same loop the file is being read, and print shuffle <>doesn't output until after all input files have been read and closed.

给它一个简单的原因-i不适用于就地改组,因为 Perl 期望print在读取文件的同一个循环中发生这种情况,并且print shuffle <>直到所有输入文件都被读取并关闭后才输出。

As a shorter workaround,

作为更短的解决方法,

$ sudo port install coreutils

will shuffle files in-place. (-nmeans "wrap the code in a while (<>) {...}loop; BEGIN{undef$/}makes Perl operate on files-at-a-time instead of lines-at-a-time, and split/^/mis needed because $_=<>has been implicitly done with an entire file instead of lines.)

将就地随机播放文件。(-n意思是“将代码包装在一个while (<>) {...}循环中;BEGIN{undef$/}使 Perl 一次对文件而不是一次对行进行操作,并且split/^/m需要它,因为$_=<>已经隐式地对整个文件而不是行进行了操作。)

回答by Dan Brickley

On OSX, grabbing latest from http://ftp.gnu.org/gnu/coreutils/and something like

在 OSX 上,从http://ftp.gnu.org/gnu/coreutils/和类似的东西中获取最新信息

./configure make sudo make install

./configure make sudo make install

...should give you /usr/local/bin/sort --random-sort

...应该给你 /usr/local/bin/sort --random-sort

without messing up /usr/bin/sort

不搞乱 /usr/bin/sort

回答by Chadwick Boggs

Or get it from MacPorts:

或者从 MacPorts 获取:

$ /opt/local//libexec/gnubin/sort --random-sort

and/or

和/或

brew install coreutils

回答by Michal Illich

shufis the best way.

shuf是最好的方法。

sort -Ris painfully slow. I just tried to sort 5GB file. I gave up after 2.5 hours. Then shufsorted it in a minute.

sort -R痛苦地缓慢。我只是尝试对 5GB 文件进行排序。2.5小时后我放弃了。然后shuf分分钟整理。

回答by John McDonnell

When I install coreutils with homebrew

当我用自制软件安装 coreutils 时

python -c "import random, sys; lines = open(sys.argv[1]).readlines(); random.shuffle(lines); print ''.join(lines)," myFile

shufbecomes available as n.

shuf变为可用n

回答by scai

A one-liner for python:

python的单行:

python -c "import random, sys; print random.choice(open(sys.argv[1]).readlines())," myFile

And for printing just a single random line:

并且只打印一个随机行:

sudo port install unsort
cat $file | unsort | ...

But see this postfor the drawbacks of python's random.shuffle(). It won't work well with many (more than 2080) elements.

但是请参阅这篇文章了解 python 的random.shuffle(). 它不适用于许多(超过 2080 个)元素。

回答by Coroos

Mac OS X with DarwinPorts:

带有 DarwinPorts 的 Mac OS X:

##代码##