Python 2.7 中的 Open() 和 codecs.open() 表现出奇怪的不同

Question

提问by Kriattiffer

I have a text file with first line of unicode characters and all other lines in ASCII. I try to read the first line as one variable, and all other lines as another. However, when I use the following code:

我有一个文本文件，其中包含第一行 unicode 字符和所有其他 ASCII 行。我尝试将第一行作为一个变量读取，将所有其他行作为另一个变量读取。但是，当我使用以下代码时：

# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()

I get the following output:

我得到以下输出：

<open file '1.txt', mode 'rb' at 0x01235230>
28

7

And now for something completely differerent:

<open file '1.txt', mode 'r' at 0x017875A0>

28

77

If I don't use readlines(), whole file reads, not only first 7 lines both at codecs.open() and open().

如果我不使用 readlines()，则会读取整个文件，而不仅仅是 codecs.open() 和 open() 的前 7 行。

Why does such thing happen? And why does codecs.open() read file in binary mode, despite the 'r' parameter is added?

为什么会发生这样的事情？为什么 codecs.open() 以二进制模式读取文件，尽管添加了 'r' 参数？

Upd: This is original file: http://www1.datafilehost.com/d/0792d687

更新：这是原始文件：http: //www1.datafilehost.com/d/0792d687

Answer 1

采纳答案by Martijn Pieters

Because you used .readline()first, the codecs.open()file has filled a linebuffer; the subsequent call to .readlines()returns onlythe buffered lines.

因为你使用了.readline()first，codecs.open()文件已经填满了一个行缓冲区；随后的调用只.readlines()返回缓冲的行。

If you call .readlines()again, the rest of the lines are returned:

如果.readlines()再次调用，则返回其余行：

>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71

The work-around is to not mix .readline()and .readlines():

解决方法是不要混合.readline()和.readlines()：

f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ')  # take the first line.

This behaviour is really a bug; the Python devs are aware of it, see issue 8260.

这种行为确实是一个错误；Python 开发人员知道这一点，请参阅issue 8260。

The other option is to use io.open()instead of codecs.open(); the iolibrary is what Python 3 uses to implement the built-in open()function and is a lot more robust and versatile than the codecsmodule.

另一种选择是使用io.open()而不是codecs.open(); 该io库是 Python 3 用来实现内置open()函数的，并且比codecs模块更加健壮和通用。

Python 2.7 中的 Open() 和 codecs.open() 表现出奇怪的不同

提问by Kriattiffer

采纳答案by Martijn Pieters

相关推荐

最近更新

标签

Python 2.7 中的 Open() 和 codecs.open() 表现出奇怪的不同

提问by Kriattiffer

采纳答案by Martijn Pieters

相关推荐

Python * 不支持的操作数类型：“float”和“Decimal”

os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) 是什么意思？Python

Python Django 要求.txt

如何一次加载无限滚动中的所有条目以解析python中的HTML

相关推荐

最近更新

标签

os.path.abspath(os.path.join(os.path.dirname(file), os.path.pardir)) 是什么意思？Python