Python 如何只跳出一个嵌套循环

Question

提问by biohazard

I have two tab-delimited files, and I need to test every row in the first file against all the rows in the other file. For instance,

我有两个制表符分隔的文件，我需要针对另一个文件中的所有行测试第一个文件中的每一行。例如，

file1:

文件 1：

row1    c1    36    345   A
row2    c3    36    9949  B
row3    c4    36    858   C

file2:

文件2：

row1    c1    3455  3800
row2    c3    6784  7843
row3    c3    10564 99302
row4    c5    1405  1563

let's say I would like to output all the rows in (file1) for which col[3] of file1 is smaller than any (not every) col[2] of file2, given that col[1] are the same.

假设我想输出 (file1) 中 file1 的 col[3] 小于 file2 的任何（不是每个）col[2] 的所有行，因为 col[1] 是相同的。

Expected output:

预期输出：

row1    c1    36    345   A
row2    c3    36    9949  B

Since I am working in Ubuntu, I would like the input command to look like this:
python code.py [file1] [file2] > [output]

由于我在 Ubuntu 中工作，我希望输入命令如下所示：
python code.py [file1] [file2] > [output]

I wrote the following code:

我写了以下代码：

import sys

filename1 = sys.argv[1]
filename2 = sys.argv[2]

file1 = open(filename1, 'r')
file2 = open(filename2, 'r')

done = False

for x in file1.readlines():
    col = x.strip().split()
    for y in file2.readlines():
        col2 = y.strip().split()
        if col[1] == col2[1] and col[3] < col2[2]:
            done = True
            break
        else: continue
print x

However, the output looks like this:

但是，输出如下所示：

row2    c3    36    9949  B

This is evident for larger datasets, but basically I always get only the last row for which the condition in the nested loop was true. I am suspecting that "break" is breaking me out of both loops. I would like to know (1) how to break out of only one of the for loops, and (2) if this is the only problem I've got here.

这对于较大的数据集很明显，但基本上我总是只得到嵌套循环中条件为真的最后一行。我怀疑“中断”使我脱离了两个循环。我想知道 (1) 如何只跳出一个 for 循环，以及 (2) 如果这是我在这里遇到的唯一问题。

Answer 1

采纳答案by NPE

breakand continueapply to the innermost loop.

break并continue应用于最里面的循环。

The issue is that you open the second file only once, and therefore it's only read once. When you execute for y in file2.readlines():for the second time, file2.readlines()returns an empty iterable.

问题是您只打开第二个文件一次，因此它只被读取一次。for y in file2.readlines():第二次执行时，file2.readlines()返回一个空的可迭代对象。

Either move file2 = open(filename2, 'r')into the outer loop, or use seek()to rewind to the beginning of file2.

要么移入file2 = open(filename2, 'r')外循环，要么使用seek()倒回到的开头file2。

Answer 2

回答by eisoku9618

You need to parse the numeric strings to their corresponding integer values.

您需要将数字字符串解析为其相应的整数值。

You can use int('hoge')as follows.

您可以int('hoge')按如下方式使用。

import sys

filename1 = sys.argv[1]
filename2 = sys.argv[2]

with open(filename1) as file1:
    for x in file1:
        with open(filename2) as file2:
            col = x.strip().split()
            for y in file2:
                col2 = y.strip().split()
                if col[1] == col2[1] and int(col[3]) < int(col2[2]):
                    print x

Python 如何只跳出一个嵌套循环

提问by biohazard

采纳答案by NPE

回答by eisoku9618

相关推荐

最近更新

标签

Python 如何只跳出一个嵌套循环

提问by biohazard

采纳答案by NPE

回答by eisoku9618

相关推荐

Python 在 gensim 中加载 Word2Vec 模型时出错

Python 熊猫“只能比较相同标记的数据帧对象”错误

Python 使用 Pandas 重命名数据框列中的元素

如何在 Python 中将二进制数组编写为图像？

相关推荐

最近更新

标签