如何在 Python 中并行地干净地循环遍历两个文件

Question

提问by Yin Zhu

I frequently write code like:

我经常写这样的代码：

lines = open('wordprob.txt','r').readlines()
words = open('StdWord.txt','r').readlines()
i = 0
for line in lines:
    v = [eval(s) for s in line.split()]
    if v[0] > v[1]:
        print words[i].strip(),
    i += 1

Is it possible to avoid use variable i and make the program shorter?

是否可以避免使用变量 i 并使程序更短？

Thanks.

谢谢。

Answer 1

回答by steveha

It looks like you don't care what the value of iis. You just are using it as a way to pair up the linesand the words. Therefore, I recommend you read one line at a time, and at the same time read one word. Then they will match.

看起来你并不关心它的价值i是什么。您只是将它用作配对lines和的一种方式words。因此，我建议您一次阅读一行，同时阅读一个单词。然后他们会匹配。

Also, when you use .readlines()you read all the input at once into memory. For large inputs, this will be slow. For this simple code, one line at a time is all you need. The file object returned by open()can act as an iterator that returns one line at a time.

此外，当您使用时，您.readlines()会立即将所有输入读入内存。对于大输入，这会很慢。对于这个简单的代码，一次一行就足够了。返回的文件对象open()可以充当一次返回一行的迭代器。

If you can, you should avoid the use of eval(). In a simple exercise where you know what the input data will be, it is pretty safe, but if you get data from outside sources, the use of eval()could possibly allow your computer to be attacked. See this pagefor more info. I will write my example code to assume that you are using eval()to turn text into a floatvalue. float()will work on an integer string value, too: float('3')will return 3.0.

如果可以，您应该避免使用eval(). 在一个简单的练习中，您知道输入数据是什么，这是非常安全的，但是如果您从外部来源获取数据，则使用eval()可能会让您的计算机受到攻击。请参阅此页面了解更多信息。我将编写我的示例代码，假设您正在使用eval()将文本转换为float值。 float()也适用于整数字符串值：float('3')将返回3.0.

Also, it appears that the input lines can only have two values. If a line ever has extra values, your code will not detect this condition. We can change the code to explicitly unpack two values from the split line, and then if there are more than two values, Python will raise an exception. Plus, the code will be slightly nicer to read.

此外，输入行似乎只能有两个值。如果一行有额外的值，您的代码将不会检测到这种情况。我们可以更改代码，显式地从分割线解包两个值，然后如果有两个以上的值，Python 将引发异常。另外，代码会稍微好一点。

So here is my suggested rewrite of this example:

所以这是我建议重写这个例子：

lines = open('wordprob.txt','rt')
words = open('StdWord.txt','rt')

for line in lines:
    word = words.next().strip()  # in Python 3: word = next(words).strip()
    a, b = [float(s) for s in line.split()]
    if a > b:
        print word,  # in Python 3: print(word + ' ', end='')

EDIT: And here is the same solution, but using izip().

编辑：这是相同的解决方案，但使用izip().

import itertools
lines = open('wordprob.txt','rt')
words = open('StdWord.txt','rt')

# in Python 3, just use zip() instead of izip()
for line, word in itertools.izip(lines, words):
    word = word.strip()
    a, b = [float(s) for s in line.split()]
    if a > b:
        print word,  # in Python 3: print(word + ' ', end='')

In Python 3, the built-in zip()returns an iterator, so you can just use that and not need to import itertools.

在 Python 3 中，内置zip()函数返回一个迭代器，所以你可以直接使用它而不需要import itertools.

EDIT: It is best practice to use a withstatement to make sure the files are properly closed, no matter what. In recent versions of Python you can have multiple with statements, and I'll do that in my solution. Also, we can unpack a generator expression just as easily as we can unpack a list, so I've changed the line that sets a, bto use a generator expression; that should be slightly faster. And we don't need to strip wordunless we are going to use it. Put the changes together to get:

编辑：最佳做法是使用with语句来确保文件已正确关闭，无论如何。在 Python 的最新版本中，您可以有多个 with 语句，我将在我的解决方案中做到这一点。此外，我们可以像解包列表一样轻松地解包生成器表达式，因此我更改了设置a, b为使用生成器表达式的行；那应该稍微快一点。word除非我们要使用它，否则我们不需要剥离。将更改放在一起以获得：

from itertools import izip

with open('wordprob.txt','rt') as lines, open('StdWord.txt','rt') as words:
    # in Python 3, just use zip() instead of izip()
    for line, word in izip(lines, words):
        a, b = (float(s) for s in line.split())
        if a > b:
            print word.strip(),  # in Python 3: print(word.strip() + ' ', end='')

Answer 2

回答by YOU

You can try to use enumerate,

您可以尝试使用枚举，

http://docs.python.org/tutorial/datastructures.html#looping-techniques

lines = open('wordprob.txt','r').readlines()
words = open('StdWord.txt','r').readlines()
for i,line in enumerate(lines):
        v = [eval(s) for s in line.split()]
        if v[0] > v[1]:
                print words[i].strip()

Answer 3

回答by Bill Lynch

In general enumerate is a good solution. In this case, you could do something like:

一般来说，枚举是一个很好的解决方案。在这种情况下，您可以执行以下操作：

lines = open('wordprob.txt','r').readlines()
words = open('StdWord.txt','r').readlines()
for word, line in zip(words, lines):
    v = [eval(s) for s in line.split()]
    if v[0] > v[1]:
            print word.strip(),

Answer 4

回答by tosh

Take a look at enumerate:

看看枚举：

>>> for i, season in enumerate(['Spring', 'Summer', 'Fall', 'Winter']):
...     print i, season
0 Spring
1 Summer
2 Fall
3 Winter

如何在 Python 中并行地干净地循环遍历两个文件

提问by Yin Zhu

回答by steveha

回答by YOU

回答by Bill Lynch

回答by tosh

相关推荐

最近更新

标签

如何在 Python 中并行地干净地循环遍历两个文件

提问by Yin Zhu

回答by steveha

回答by YOU

回答by Bill Lynch

回答by tosh

相关推荐

python 结构错误找不到记录器“paramiko.transport”的处理程序

python 使用部分下载 (HTTP) 下载文件

python 组织模块和包的 Pythonic 方式

python 用数组替换操作后的 NaN 值零

相关推荐

最近更新

标签