Python 如何在文本文件中找到重复的行并打印它们？

Question

提问by samiles

I have a text file with some 1,200 rows. Some of them are duplicates.

我有一个大约有 1,200 行的文本文件。其中一些是重复的。

How could I find the duplicate lines in the file (but not worrying about case) and then print out the line's text on the screen, so I can go off and find it? I don't want to delete them or anything, just find which lines they might be.

我怎样才能在文件中找到重复的行（但不担心大小写），然后在屏幕上打印出该行的文本，以便我可以找到它？我不想删除它们或任何东西，只需找到它们可能是哪些行。

Answer 1

采纳答案by mgilson

This is pretty easy with a set:

这很容易用一组：

with open('file') as f:
    seen = set()
    for line in f:
        line_lower = line.lower()
        if line_lower in seen:
            print(line)
        else:
            seen.add(line_lower)

Answer 2

回答by Ashwini Chaudhary

as there are only 1200 lines, so you can also use collections.Counter():

因为只有 1200 行，所以你也可以使用collections.Counter()：

>>> from collections import Counter

>>> with open('data1.txt') as f:
...     c=Counter(c.strip().lower() for c in f if c.strip()) #for case-insensitive search
...     for line in c:
...         if c[line]>1:
...             print line
...

if data1.txtis something like this:

如果data1.txt是这样的：

ABC
abc
aBc
CAB
caB
bca
BcA
acb

output is:

输出是：

cab
abc
bca

Answer 3

回答by Todd A. Jacobs

Finding Case-Insensitive Duplicates

查找不区分大小写的重复项

This won't give you line numbers, but it willgive you a list of duplicate lines which you can then investigate further. For example:

这不会给你行号，但它会给予你的重复行，然后可以进一步调查名单。例如：

tr 'A-Z' 'a-z' < /tmp/foo | sort | uniq -d

Example Data File

示例数据文件

# /tmp/foo
one
One
oNe
two
three

The pipeline listed above will correctly yield:

上面列出的管道将正确产生：

one

一

Finding the Line Numbers

查找行号

You could then grep for related line numbers like so:

然后你可以 grep 相关的行号，如下所示：

grep --ignore-case --line-number one /tmp/foo

Python 如何在文本文件中找到重复的行并打印它们？

提问by samiles

采纳答案by mgilson

回答by Ashwini Chaudhary

回答by Todd A. Jacobs

Finding Case-Insensitive Duplicates

查找不区分大小写的重复项

Example Data File

示例数据文件

Finding the Line Numbers

查找行号

相关推荐

最近更新

标签

Python 如何在文本文件中找到重复的行并打印它们？

提问by samiles

采纳答案by mgilson

回答by Ashwini Chaudhary

回答by Todd A. Jacobs

Finding Case-Insensitive Duplicates

查找不区分大小写的重复项

Example Data File

示例数据文件

Finding the Line Numbers

查找行号

相关推荐

Python TypeError：不支持的操作数类型-：'int'和'function'

未使用 pip 在 vi​​rtualenv 中安装 Python 包

最近邻搜索：Python

Python 如何替换字符串末尾的某些字符？

相关推荐

最近更新

标签

未使用 pip 在 virtualenv 中安装 Python 包