Python 如何从一个文件中随机读取一行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3540288/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I read a random line from one file?
提问by Shane
Is there a built-in method to do it? If not how can I do this without costing too much overhead?
有没有内置的方法来做到这一点?如果不是,我怎么能在不花费太多开销的情况下做到这一点?
采纳答案by Alex Martelli
Not built-in, but algorithm R(3.4.2)(Waterman's "Reservtheitroad Algorithm") from Knuth's "The Art of Computer Programming" is good (in a very simplified version):
不是内置的,但是R(3.4.2)Knuth 的“计算机编程艺术”中的算法(沃特曼的“水库算法”)很好(非常简化的版本):
import random
def random_line(afile):
line = next(afile)
for num, aline in enumerate(afile, 2):
if random.randrange(num): continue
line = aline
return line
The num, ... in enumerate(..., 2)iterator produces the sequence 2, 3, 4... The randrangewill therefore be 0 with a probability of 1.0/num-- and that's the probability with which we must replace the currently selected line (the special-case of sample size 1 of the referenced algorithm -- see Knuth's book for proof of correctness == and of course we're also in the case of a small-enough "reservtheitroad" to fit in memory ;-))... and exactly the probability with which we doso.
所述num, ... in enumerate(..., 2)迭代器产生的序列2,3,4 ...的randrange因此将0的概率为1.0/num-这与我们必须替换当前选定的行(被引用的算法的样本大小1的特殊情况的概率-- 请参阅 Knuth 的书以获取正确性的证明 == 当然,我们也遇到了一个足够小的“水库”以适合内存的情况 ;-))... 以及我们这样做的确切概率。
回答by Ignacio Vazquez-Abrams
Seek to a random position, read a line and discard it, then read another line. The distribution of lines won't be normal, but that doesn't always matter.
寻找随机位置,读取一行并丢弃它,然后读取另一行。线条的分布不会是正常的,但这并不总是重要的。
回答by cji
It depends what do you mean by "too much" overhead. If storing whole file in memory is possible, then something like
这取决于你所说的“太多”开销是什么意思。如果可以将整个文件存储在内存中,那么类似
import random
random_lines = random.choice(open("file").readlines())
would do the trick.
会做的伎俩。
回答by Tony Veijalainen
import random
lines = open('file.txt').read().splitlines()
myline =random.choice(lines)
print(myline)
For very long file: seek to random place in file based on it's length and find two newline characters after position (or newline and end of file). Do again 100 characters before or from beginning of file if original seek position was <100 if we ended up inside the last line.
对于很长的文件:根据文件的长度在文件中随机放置,并在位置(或换行符和文件结尾)之后找到两个换行符。如果原始查找位置 <100,如果我们在最后一行内结束,则在文件开头或开头再执行 100 个字符。
However this is over complicated, as file is iterator.So make it list and take random.choice (if you need many, use random.sample):
然而这太复杂了,因为文件是迭代器。所以让它列出并选择 random.choice (如果你需要很多,使用 random.sample):
import random
print(random.choice(list(open('file.txt'))))
回答by Nick Pandolfi
Although I am four years late, I think I have the fastest solution. Recently I wrote a python package called linereader, which allows you to manipulate the pointers of file handles.
虽然我晚了四年,但我认为我有最快的解决方案。最近我写了一个名为linereader的 python 包,它允许你操作文件句柄的指针。
Here is the simple solution to getting a random line with this package:
这是使用此包获取随机行的简单解决方案:
from random import randint
from linereader import dopen
length = #lines in file
filename = #directory of file
file = dopen(filename)
random_line = file.getline(randint(1, length))
The first time this is done is the worst, as linereader has to compile the output file in a special format. After this is done, linereader can then access any line from the file quickly, whatever size the file is.
第一次这样做是最糟糕的,因为 linereader 必须以特殊格式编译输出文件。完成此操作后,linereader 可以快速访问文件中的任何行,无论文件大小如何。
If your file is very small (small enough to fit into an MB), then you can replace dopenwith copen, and it makes a cached entry of the file within memory. Not only is this faster, but you get the number of lines within the file as it is loaded into memory; it is done for you. All you need to do is to generate the random line number. Here is some example code for this.
如果您的文件非常小(小到可以容纳 1 MB),那么您可以替换dopen为copen,它会在内存中缓存文件条目。这不仅更快,而且在文件加载到内存时您可以获得文件中的行数;它是为你完成的。您需要做的就是生成随机行号。这是一些示例代码。
from random import randint
from linereader import copen
file = copen(filename)
lines = file.count('\n')
random_line = file.getline(randint(1, lines))
I just got really happy because I saw someone who could benefit from my package! Sorry for the dead answer, but the package could definitely be applied to many other problems.
我真的很高兴,因为我看到有人可以从我的包裹中受益!抱歉死了答案,但该软件包绝对可以应用于许多其他问题。
回答by ideasman42
If you don't want to read over the entire file, you can seek into the middle of the file, then seek backwards for the newline, and call readline.
如果您不想通读整个文件,可以在文件中间查找,然后向后查找换行符,然后调用readline.
Here is a Python3 script which does just this,
这是一个执行此操作的 Python3 脚本,
One disadvantage with this method is short lines have lower likelyhood of showing up.
这种方法的一个缺点是短线出现的可能性较低。
def read_random_line(f, chunk_size=16):
import os
import random
with open(f, 'rb') as f_handle:
f_handle.seek(0, os.SEEK_END)
size = f_handle.tell()
i = random.randint(0, size)
while True:
i -= chunk_size
if i < 0:
chunk_size += i
i = 0
f_handle.seek(i, os.SEEK_SET)
chunk = f_handle.read(chunk_size)
i_newline = chunk.rfind(b'\n')
if i_newline != -1:
i += i_newline + 1
break
if i == 0:
break
f_handle.seek(i, os.SEEK_SET)
return f_handle.readline()
回答by GoTrained
You can add the lines into a set() which will change their order randomly.
您可以将这些行添加到 set() 中,这将随机更改它们的顺序。
filename=open("lines.txt",'r')
f=set(filename.readlines())
filename.close()
To find the 1st line:
找到第一行:
print(next(iter(f)))
To find the 3rd line:
要找到第 3 行:
print(list(f)[2])
To list all the lines in the set:
列出集合中的所有行:
for line in f:
print(line)
回答by Philip Hughes
This may be bulky, but it works I guess? (at least for txt files)
这可能很笨重,但我猜它有用吗?(至少对于 txt 文件)
import random
choicefile=open("yourfile.txt","r")
linelist=[]
for line in choicefile:
linelist.append(line)
choice=random.choice(linelist)
print(choice)
It reads each line of a file, and appends it to a list. It then chooses a random line from the list. If you want to remove the line once it's chosen, just do
它读取文件的每一行,并将其附加到列表中。然后从列表中随机选择一行。如果您想在选择后删除该行,请执行以下操作
linelist.remove(choice)
Hope this may help, but at least no extra modules and imports (apart from random) and relatively lightweight.
希望这可能会有所帮助,但至少没有额外的模块和导入(除了随机)并且相对轻巧。
回答by HCLivess
import random
with open("file.txt", "r") as f:
lines = f.readlines()
print (random.choice(lines))
回答by Eugene Yarmash
A slightly improved version of the Alex Martelli's answer, which handles empty files (by returning a defaultvalue):
Alex Martelli's answer 的一个稍微改进的版本,它处理空文件(通过返回一个default值):
from random import randrange
def random_line(afile, default=None):
line = default
for i, aline in enumerate(afile, start=1):
if randrange(i) == 0: # random int [0..i)
line = aline
return line
This approachcan be used to get a random item from any iterator using O(n)time and O(1)space.
这种方法可用于使用O(n)时间和O(1)空间从任何迭代器中获取随机项。

