如何使用 Python 从文本文件中返回唯一的单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22978602/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to return unique words from the text file using Python
提问by user927584
How do I return all the unique words from a text file using Python? For example:
如何使用 Python 从文本文件中返回所有唯一单词?例如:
I am not a robot
I am a human
我不是机器人
我是人
Should return:
应该返回:
I
am
not
a
robot
human
一世
是
不是
一种
机器人
人类
Here is what I've done so far:
这是我到目前为止所做的:
def unique_file(input_filename, output_filename):
input_file = open(input_filename, 'r')
file_contents = input_file.read()
input_file.close()
word_list = file_contents.split()
file = open(output_filename, 'w')
for word in word_list:
if word not in word_list:
file.write(str(word) + "\n")
file.close()
The text file the Python creates has nothing in it. I'm not sure what I am doing wrong
Python 创建的文本文件中没有任何内容。我不确定我做错了什么
回答by mhlester
for word in word_list:
if word not in word_list:
every word
is in word_list
, by definition from the first line.
每个word
都在word_list
,根据第一行的定义。
Instead of that logic, use a set
:
而不是那个逻辑,使用一个set
:
unique_words = set(word_list)
for word in unique_words:
file.write(str(word) + "\n")
set
s only hold unique members, which is exactly what you're trying to achieve.
set
s 只持有独特的成员,这正是您想要实现的目标。
Note that order won't be preserved, but you didn't specify if that's a requirement.
请注意,订单不会被保留,但您没有指定这是否是一项要求。
回答by A.J. Uppal
def unique_file(input_filename, output_filename):
input_file = open(input_filename, 'r')
file_contents = input_file.read()
input_file.close()
duplicates = []
word_list = file_contents.split()
file = open(output_filename, 'w')
for word in word_list:
if word not in duplicates:
duplicates.append(word)
file.write(str(word) + "\n")
file.close()
This code loops over every word, and if it is not in a list duplicates
, it appends the word and writes it to a file.
此代码循环遍历每个单词,如果它不在列表中duplicates
,则附加该单词并将其写入文件。
回答by user2963623
The problem with your code is word_list already has all possible words of the input file. When iterating over the loop you are basically checking if a word in word_list is not present in itself. So it'll always be false. This should work.. (Note that this wll also preserve the order).
您的代码的问题是 word_list 已经包含输入文件的所有可能单词。迭代循环时,您基本上是在检查 word_list 中的单词本身是否不存在。所以它永远是假的。这应该可以工作..(请注意,这也将保留顺序)。
def unique_file(input_filename, output_filename):
z = []
with open(input_filename,'r') as fileIn, open(output_filename,'w') as fileOut:
for line in fileIn:
for word in line.split():
if word not in z:
z.append(word)
fileOut.write(word+'\n')
回答by agrinh
Simply iterate over the lines in the file and use set to keep only the unique ones.
只需遍历文件中的行并使用 set 仅保留唯一的行。
from itertools import chain
def unique_words(lines):
return set(chain(*(line.split() for line in lines if line)))
Then simply do the following to read all unique lines from a file and print them
然后只需执行以下操作即可从文件中读取所有唯一行并打印它们
with open(filename, 'r') as f:
print(unique_words(f))
回答by sebio
This seems to be a typical application for a collection:
这似乎是一个集合的典型应用:
...
import collections
d = collections.OrderedDict()
for word in wordlist: d[word] = None
# use this if you also want to count the words:
# for word in wordlist: d[word] = d.get(word, 0) + 1
for k in d.keys(): print k
You could also use a collection.Counter(), which would also count the elements you feed in. The order of the words would get lost though. I added a line for counting and keeping the order.
您还可以使用 collection.Counter(),它还会计算您输入的元素。但是单词的顺序会丢失。我添加了一行用于计数和保持订单。
回答by Washington Luiz
Using Regex and Set:
使用正则表达式和设置:
import re
words = re.findall('\w+', text.lower())
uniq_words = set(words)
Other way is creating a Dict and inserting the words like keys:
另一种方法是创建一个 Dict 并插入像键这样的词:
for i in range(len(doc)):
frase = doc[i].split(" ")
for palavra in frase:
if palavra not in dict_word:
dict_word[palavra] = 1
print dict_word.keys()
回答by joshua riddle
Use a set. You don't need to import anything to do this.
使用一套。您无需导入任何内容即可执行此操作。
#Open the file
my_File = open(file_Name, 'r')
#Read the file
read_File = my_File.read()
#Split the words
words = read_File.split()
#Using a set will only save the unique words
unique_words = set(words)
#You can then print the set as a whole or loop through the set etc
for word in unique_words:
print(word)
回答by frp farhan
string = "I am not a robot\n I am a human"
list_str = string.split()
print list(set(list_str))
回答by kalla dhamodar
try:
with open("gridlex.txt",mode="r",encoding="utf-8")as india:
for data in india:
if chr(data)==chr(data):
print("no of chrats",len(chr(data)))
else:
print("data")
except IOError:
print("sorry")