Python 读取流

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26127889/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:07:31  来源:igfitidea点击:

Python read stream

pythonfilestream

提问by dylnmc

I need a veryinexpensive way of reading a buffer with no terminating string (a stream) in Python. This is what I have, but it wastes a a lotof CPU time and effort. Because it is constantly "trying and catching." I really need a new approach.

我需要一种非常便宜的方式来读取 Python 中没有终止字符串(流)的缓冲区。这是我所拥有的,但它浪费了大量的CPU时间和精力。因为它在不断地“尝试和捕捉”。我真的需要一种新方法。

Here is a reduced working version of my code:

这是我的代码的简化工作版本:

#! /usr/bin/env/ python
import fcntl, os, sys

if __name__ == "__main__":
    f = open("/dev/urandom", "r")
    fd = f.fileno()
    fl = fcntl.fcntl(fd, fcntl.F_GETFL)
    fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)

    ready = False
    line = ""
    while True:
        try:
            char = f.read()
            if char == '\r':
                continue
            elif char = '\n':
                ready = True
            else:
                line += char
        except:
            continue
        if ready:
            print line

Don'trun this in the terminal. It's simply for illustration. "urandom" will break your terminal because it spits out a lot of random characters that the terminal emulator interprets no matter what (which can change your current shells settings, title, etc). I was reading from a gps connected via usb.

不要在终端中运行它。这只是为了说明。“urandom”会破坏你的终端,因为它会吐出很多终端模拟器解释的随机字符(这可以改变你当前的 shell 设置、标题等)。我正在从通过 USB 连接的 GPS 读取数据。

The problem: this uses 100% of the CPU usage when it can. I have tried this:

问题:这会尽可能使用 100% 的 CPU 使用率。我试过这个:

#! /usr/bin/env/ python
import fcntl, os, sys

if __name__ == "__main__":
    f = open("/dev/urandom", "r")
    fd = f.fileno()
    fl = fcntl.fcntl(fd, fcntl.F_GETFL)
    fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)

    for line in f.readlines():
        print line

However, I get IOError: [Errno 11] Resource temporarily unavailable. I have tried to use Popenamongst other things. I am at a loss. Can someone please provide a solution (and please explain everything, as I am not a pro, per se). Also, I should note that this is for Unix (particularly Linux, but it must be portable across all versions of Linux).

但是,我得到IOError: [Errno 11] Resource temporarily unavailable. 我曾尝试使用Popen其他东西。我很茫然。有人可以提供解决方案(并请解释所有内容,因为我本身不是专业人士)。另外,我应该注意到这是针对 Unix(尤其是 Linux,但它必须在所有版本的 Linux 上都可以移植)。

采纳答案by Anoyz

You will want to set your buffering mode to the size of the chunk you want to read when opening the file stream. From python documentation:

您需要将缓冲模式设置为打开文件流时要读取的块的大小。来自python文档:

io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True)

"bufferingis an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer."

" buffering是一个可选整数,用于设置缓冲策略。传递 0 以关闭缓冲(仅在二进制模式下允许),1 以选择行缓冲(仅在文本模式下可用),以及一个大于 1 的整数表示大小一个固定大小的块缓冲区。”

You also want to use the readable() method in the while loop, to avoid unnecessary resource consumption.

您还希望在 while 循环中使用 readable() 方法,以避免不必要的资源消耗。

However, I advise you to use buffered streamssuch as io.BytesIOor io.BufferedReader

但是,我建议您使用缓冲流,例如io.BytesIOio.BufferedReader

More info in the docs.

文档中的更多信息。

回答by Rob?

The simple solutions are the best:

简单的解决方案是最好的:

with open('/dev/urandom', 'r') as f:
    for line in f:
        print line.encode('hex')  # Don't mess up my terminal

Or, alterantively

或者,交替

with open('/dev/urandom', 'r') as f:
    for line in iter(f.readline, ''):
        print line.encode('hex')  # Don't mess up my terminal

Notes:

笔记:

  • Leave the file descriptor in blocking mode, so the OS can block your process (and save CPU time) when there is no data available.

  • It is important to use an iterator in the loop. Consider for line in f.readlines():. f.readlines()reads all of the data, puts it all in a list, and returns that list. Since we have infinite data, f.readlines()will never return successfully. In contrast, freturns an iterator -- it only gets as much data as it needs to satisfy the next loop iteration (and just a little more for a performance buffer.)

  • The first version reads ahead and buffers enough data to print several lines. The second version returns each line immediately. Use the first version if conserving CPU is your primary concern. Use the second if interactive response time is your primary concern.

  • 将文件描述符保留在阻塞模式下,以便操作系统可以在没有可用数据时阻塞您的进程(并节省 CPU 时间)。

  • 在循环中使用迭代器很重要。考虑for line in f.readlines():f.readlines()读取所有数据,将其全部放入一个列表中,然后返回该列表。由于我们有无限数据,f.readlines()永远不会成功返回。相比之下,f返回一个迭代器——它只获取满足下一次循环迭代所需的数据(对于性能缓冲区只需要多一点。)

  • 第一个版本提前读取并缓冲足够的数据以打印多行。第二个版本立即返回每一行。如果保护 CPU 是您的主要关注点,请使用第一个版本。如果交互式响应时间是您的主要关注点,请使用第二个。

Demonstration:

示范:

$ python x.py  | head -2l
eb99f1b3bf74eead42750c63cb7c16160fa7e21c94b176dc6fd2d6796a1428dc8c5d15f13e3c1d5969cb59317eaba37a97f4719bb3de87919009da013fa06ae738408478bc15c750850744a4edcc27d155749d840680bf3a827aafbe9be84e7c8e2fe5785d2305cbedd76454573ca9261ac9a480f71242baa94e8d4bdf761705a6a0fea1ba2b1502066b2538a62776e9165043e5b7337d45773d009fd06d15ca0d9b51af499c1c9d7684472272a7361751d220848874215bc494456b08910e9815fc533d3545129aad4f3f126dc5341266ca4b85ea949794cacaf16409bcd02263b08613190b3f69caa68a47758345dafb10121cfe6ed6c8098142682aef47d1080bd2e218b571824bf2fa5d0bb5297278be8a9a2f55b554631c99e5f1d9040c5bc2bde9a40c8b6e95fc47be6ea9235243582f2367893d15a1494f732d0346ec6184a366f8035aef9141c638128444b1549a64937697b1a170e648d20f336e352076893fa7265c8fa0f4e2207e87410e53b43a51aa146ac6c2decf274a45a58c4e442aececf28879a3e0b4a1278eac7a4f969b3f74e2f2a2064a55ff112c4c49092366dbaa125703962ec5083d09cdb750c0e1dbe34cadda66709f98ff63faccf0045993137bfaca949686bc395bbafb7cf9b5b3475a0c91bdea8cec4e9ac1a9c96e0b81c1c5f242ae72cdea4c073db0351322f9da31203ea34d1b6f298128435797f4846a53b0733069060680dbc2b44c662c4b685ced5419b65c01df41cc2dd9877dc2a97a965174d508a3c9275d8aee7f2991bbb06ca7e0010b0e5b9468aed12f5d2c9a65091223547b8655211df435ffbf24768d48c7e7cf3cb7225f2c116e94a8602078f2b34dab6852f57708e760f88f4085ec7dade19ed558a539f830adea1b81f46303789224802f1f090ec0ff59e291246f1287672b0035df07c359d2ada48e674622f61c0f456c36d130fb6cf7f529e7c4dfceccc594ba5e812a3250e022eca9576a5a8b31c0be13969841d5a4d52b10a7dc8ddd1cac279500cb66e3b244e7d1e042249fd8adf2a90fa8bee74378d79a3d55c6fcf6cc19aa85ffb078dba23ca88ea6810d4a1c5d98b3b33e68ddd41c881df167c36ab2e1b081849781e08e3a026fbd3755acf9f215e0402cbf1a021300f5c883f86a05d467479172109a8f20f93c8c255915a264463eb113c3e8d07d0cec31aa8c0f978a0e7e65c142e85383befd6679c69edd2c56599f15580bbb356d98cfdf012dbc6d1dd6c0dbcfe6f8235d3d5c015fb94d8cc29afdf5d69e33d0e5078d651782546bc2acccab9f35e595f0951a139526ae5651a3ebbec353e99f9ddd1615ed25529500dabe8bf6f12ee6b21a437caca12a6d9688986d94fb7c103dca1572350900e56276b857630a
9d024ef4454dcd5e35dd605a2d49c26ce44fae87ab33e7a158d328521c7d77969908ec5b67f01bf8e2c330dcb70b5f3def8e6d4b010c6d31e4cbe7478657782f10b6fc2d77e8ff7a2f1e590827827e1037b33b0a
Traceback (most recent call last):
  File "x.py", line 4, in <module>
    print line.encode('hex')  # Don't mess up my terminal
IOError: [Errno 32] Broken pipe

回答by dylnmc

I decided to use io. I noticed that this is much more accurate than even a while True:. The gps that I am reading from is supposed to spit out info every second, but I noticed it was really anywhere from .95 to 1.05 secs. That was when I was doing what I posted in my question.

我决定使用io。我注意到这甚至比while True:. 我正在阅读的 gps 应该每秒输出一次信息,但我注意到它确实在 0.95 到 1.05 秒之间。那是我在做我在我的问题中发布的内容的时候。

However, when I simply do

然而,当我简单地做

#! /usr/bin/env/ python

import io

if __name__ == "__main__":
    f = io.open("/dev/ttyUSB0")
    while True:
        print f.readline().strip()

It not only temporarily blocks(which save cpu time, and does all sorts of good), but it also apparently keeps the buffer extremely up to date because it seems to produce results almost exactly one second apart (which is when my gps - like most - updates).

它不仅会暂时阻塞(这可以节省 cpu 时间,并且有很多好处),而且显然还可以使缓冲区保持最新状态,因为它似乎产生的结果几乎正好相隔一秒(这是我的 gps - 就像大多数- 更新)。

A true miracle that class is - a true miracle - that is if it were the only way to do it like this. One could just use open(file, "r"), and it works fine (which angers me because I spent quite an entire day on this).

该课程是一个真正的奇迹 - 一个真正的奇迹 - 如果它是这样做的唯一方法。一个人可以只使用open(file, "r"),它工作正常(这让我很生气,因为我花了一整天的时间)。