Python:检查单词是否拼写正确

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4500752/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 16:01:58  来源:igfitidea点击:

Python: check whether a word is spelled correctly

pythonspell-checking

提问by Nikolai

I'm looking for a an easy way to check whether a certain string is a correctly-spelled English word. For example, 'looked' would return True while 'hurrr' would return False. I don't need spelling suggestions or any spelling-correcting features. Just a simple function that takes a string and returns a boolean value.

我正在寻找一种简单的方法来检查某个字符串是否是拼写正确的英语单词。例如,'looked' 将返回 True,而 'hurrr' 将返回 False。我不需要拼写建议或任何拼写纠正功能。只是一个简单的函数,它接受一个字符串并返回一个布尔值。

采纳答案by user225312

Two possible ways of doing it:

两种可能的方法:

  1. Have your own file which has all the valid words. Load the file into a set and compare each word to see whether it exists in it (word in set)
  2. (The better way) Use PyEnchant, a spell checking library for Python
  1. 拥有自己的文件,其中包含所有有效单词。将文件加载到一个集合中,比较每个单词是否存在于其中(word in set)
  2. (更好的方法)使用PyEnchant,一个 Python 的拼写检查库

PyEnchant is not actively maintained now.

PyEnchant 现在没有得到积极维护。

回答by Surya

Yahoo provides spell checking APIthrough YQL.

雅虎通过 YQL提供拼写检查API

Its pretty simple and you get 5000 queries/ip address/day for non-commercial use (FREE)

它非常简单,您可以获得 5000 个查询/IP 地址/天用于非商业用途(免费)

回答by Chris Farr

I was looking for the same functionality and struggled to find an existing library that works in Windows, 64 bit. PyEnchant, although a great library, isn't currently active and doesn't work in 64 bit. Other libraries I found didn't work in Windows.

我一直在寻找相同的功能,并努力寻找可在 64 位 Windows 中运行的现有库。PyEnchant 虽然是一个很棒的库,但目前还没有激活,也不能在 64 位上运行。我发现的其他库在 Windows 中不起作用。

I finally found a solution that I hope others will find valuable.

我终于找到了一个解决方案,我希望其他人会觉得有价值。

The solution...

解决方案...

  • Use nltk
  • Extract the word list from nltk.corpus.brown
  • Convert the word list to a set (for efficient searching)
  • Use the inkeyword to determine if your string is in the set
  • 使用 nltk
  • 从 nltk.corpus.brown 中提取单词列表
  • 将单词列表转换为集合(用于高效搜索)
  • 使用in关键字来确定您的字符串是否在集合中


from nltk.corpus import brown
word_list = brown.words()
word_set = set(word_list)

# Check if word is in set
"looked" in word_set  # Returns True
"hurrr" in word_set  # Returns False

Use a timer check and you'll see this takes virtually no time to search the set. A test on 1,000 words took 0.004 seconds.

使用计时器检查,您会发现这几乎不需要时间来搜索集合。对 1,000 个单词的测试需要 0.004 秒。

回答by krinker

I personally used: http://textblob.readthedocs.io/en/dev/It is an active project and according to the website:

我个人使用:http: //textblob.readthedocs.io/en/dev/这是一个活跃的项目,根据网站:

Spelling correction is based on Peter Norvig's “How to Write a Spelling Corrector”[1] as implemented in the pattern library. It is about 70% accurate

拼写校正基于模式库中实现的 Peter Norvig 的“如何编写拼写校正器”[1]。它的准确率约为 70%