如何使用 Python 从文本文件中返回唯一的单词

Question

提问by user927584

How do I return all the unique words from a text file using Python? For example:

如何使用 Python 从文本文件中返回所有唯一单词？例如：

I am not a robot
I am a human

我不是机器人
我是人

Should return:

应该返回：

I
am
not
a
robot
human

一世
是
不是
一种
机器人
人类

Here is what I've done so far:

这是我到目前为止所做的：

def unique_file(input_filename, output_filename):
    input_file = open(input_filename, 'r')
    file_contents = input_file.read()
    input_file.close()
    word_list = file_contents.split()

    file = open(output_filename, 'w')

    for word in word_list:
        if word not in word_list:
            file.write(str(word) + "\n")
    file.close()

The text file the Python creates has nothing in it. I'm not sure what I am doing wrong

Python 创建的文本文件中没有任何内容。我不确定我做错了什么

Answer 1

回答by mhlester

for word in word_list:
    if word not in word_list:

every wordis in word_list, by definition from the first line.

每个word都在word_list，根据第一行的定义。

Instead of that logic, use a set:

而不是那个逻辑，使用一个set：

unique_words = set(word_list)
for word in unique_words:
    file.write(str(word) + "\n")

sets only hold unique members, which is exactly what you're trying to achieve.

sets 只持有独特的成员，这正是您想要实现的目标。

Note that order won't be preserved, but you didn't specify if that's a requirement.

请注意，订单不会被保留，但您没有指定这是否是一项要求。

Answer 2

回答by A.J. Uppal

def unique_file(input_filename, output_filename):
    input_file = open(input_filename, 'r')
    file_contents = input_file.read()
    input_file.close()
    duplicates = []
    word_list = file_contents.split()
    file = open(output_filename, 'w')
    for word in word_list:
        if word not in duplicates:
            duplicates.append(word)
            file.write(str(word) + "\n")
    file.close()

This code loops over every word, and if it is not in a list duplicates, it appends the word and writes it to a file.

此代码循环遍历每个单词，如果它不在列表中duplicates，则附加该单词并将其写入文件。

Answer 3

回答by user2963623

The problem with your code is word_list already has all possible words of the input file. When iterating over the loop you are basically checking if a word in word_list is not present in itself. So it'll always be false. This should work.. (Note that this wll also preserve the order).

您的代码的问题是 word_list 已经包含输入文件的所有可能单词。迭代循环时，您基本上是在检查 word_list 中的单词本身是否不存在。所以它永远是假的。这应该可以工作..（请注意，这也将保留顺序）。

def unique_file(input_filename, output_filename):
  z = []
  with open(input_filename,'r') as fileIn, open(output_filename,'w') as fileOut:
      for line in fileIn:
          for word in line.split():
              if word not in z:
                 z.append(word)
                 fileOut.write(word+'\n')

Answer 4

回答by agrinh

Simply iterate over the lines in the file and use set to keep only the unique ones.

只需遍历文件中的行并使用 set 仅保留唯一的行。

from itertools import chain

def unique_words(lines):
    return set(chain(*(line.split() for line in lines if line)))

Then simply do the following to read all unique lines from a file and print them

然后只需执行以下操作即可从文件中读取所有唯一行并打印它们

with open(filename, 'r') as f:
    print(unique_words(f))

Answer 5

回答by sebio

This seems to be a typical application for a collection:

这似乎是一个集合的典型应用：

...
import collections
d = collections.OrderedDict()
for word in wordlist: d[word] = None 
# use this if you also want to count the words:
# for word in wordlist: d[word] = d.get(word, 0) + 1 
for k in d.keys(): print k

You could also use a collection.Counter(), which would also count the elements you feed in. The order of the words would get lost though. I added a line for counting and keeping the order.

您还可以使用 collection.Counter()，它还会计算您输入的元素。但是单词的顺序会丢失。我添加了一行用于计数和保持订单。

Answer 6

回答by Washington Luiz

Using Regex and Set:

使用正则表达式和设置：

import re
words = re.findall('\w+', text.lower())
uniq_words = set(words)

Other way is creating a Dict and inserting the words like keys:

另一种方法是创建一个 Dict 并插入像键这样的词：

for i in range(len(doc)):
        frase = doc[i].split(" ")
        for palavra in frase:
            if palavra not in dict_word:
                dict_word[palavra] = 1
print dict_word.keys()

Answer 7

回答by joshua riddle

Use a set. You don't need to import anything to do this.

使用一套。您无需导入任何内容即可执行此操作。

#Open the file
my_File = open(file_Name, 'r')
#Read the file
read_File = my_File.read()
#Split the words
words = read_File.split()
#Using a set will only save the unique words
unique_words = set(words)
#You can then print the set as a whole or loop through the set etc
for word in unique_words:
     print(word)

Answer 8

回答by frp farhan

string = "I am not a robot\n I am a human"
list_str = string.split()
print list(set(list_str))

Answer 9

回答by kalla dhamodar

try:
    with open("gridlex.txt",mode="r",encoding="utf-8")as india:

        for data in india:
            if chr(data)==chr(data):
                print("no of chrats",len(chr(data)))
            else:
                print("data")
except IOError:
    print("sorry")

如何使用 Python 从文本文件中返回唯一的单词

提问by user927584

回答by mhlester

回答by A.J. Uppal

回答by user2963623

回答by agrinh

回答by sebio

回答by Washington Luiz

回答by joshua riddle

回答by frp farhan

回答by kalla dhamodar

相关推荐

最近更新

标签

如何使用 Python 从文本文件中返回唯一的单词

提问by user927584

回答by mhlester

回答by A.J. Uppal

回答by user2963623

回答by agrinh

回答by sebio

回答by Washington Luiz

回答by joshua riddle

回答by frp farhan

回答by kalla dhamodar

相关推荐

Python 将 numpy ndarray 写入图像

尝试为 python (psycopg2) 安装 Postgres 时出错

Python 启动一个完全独立的进程

Python 将目录路径作为用户输入的正确方法是什么？

相关推荐

最近更新

标签