Python 如何在文本文件中找到重复的行并打印它们?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12937798/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I find duplicate lines in a text file and print them?
提问by samiles
I have a text file with some 1,200 rows. Some of them are duplicates.
我有一个大约有 1,200 行的文本文件。其中一些是重复的。
How could I find the duplicate lines in the file (but not worrying about case) and then print out the line's text on the screen, so I can go off and find it? I don't want to delete them or anything, just find which lines they might be.
我怎样才能在文件中找到重复的行(但不担心大小写),然后在屏幕上打印出该行的文本,以便我可以找到它?我不想删除它们或任何东西,只需找到它们可能是哪些行。
采纳答案by mgilson
This is pretty easy with a set:
这很容易用一组:
with open('file') as f:
seen = set()
for line in f:
line_lower = line.lower()
if line_lower in seen:
print(line)
else:
seen.add(line_lower)
回答by Ashwini Chaudhary
as there are only 1200 lines, so you can also use collections.Counter():
因为只有 1200 行,所以你也可以使用collections.Counter():
>>> from collections import Counter
>>> with open('data1.txt') as f:
... c=Counter(c.strip().lower() for c in f if c.strip()) #for case-insensitive search
... for line in c:
... if c[line]>1:
... print line
...
if data1.txtis something like this:
如果data1.txt是这样的:
ABC
abc
aBc
CAB
caB
bca
BcA
acb
output is:
输出是:
cab
abc
bca
回答by Todd A. Jacobs
Finding Case-Insensitive Duplicates
查找不区分大小写的重复项
This won't give you line numbers, but it willgive you a list of duplicate lines which you can then investigate further. For example:
这不会给你行号,但它会给予你的重复行,然后可以进一步调查名单。例如:
tr 'A-Z' 'a-z' < /tmp/foo | sort | uniq -d
Example Data File
示例数据文件
# /tmp/foo
one
One
oNe
two
three
The pipeline listed above will correctly yield:
上面列出的管道将正确产生:
one
一
Finding the Line Numbers
查找行号
You could then grep for related line numbers like so:
然后你可以 grep 相关的行号,如下所示:
grep --ignore-case --line-number one /tmp/foo

