Python:检查单词是否拼写正确
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4500752/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: check whether a word is spelled correctly
提问by Nikolai
I'm looking for a an easy way to check whether a certain string is a correctly-spelled English word. For example, 'looked' would return True while 'hurrr' would return False. I don't need spelling suggestions or any spelling-correcting features. Just a simple function that takes a string and returns a boolean value.
我正在寻找一种简单的方法来检查某个字符串是否是拼写正确的英语单词。例如,'looked' 将返回 True,而 'hurrr' 将返回 False。我不需要拼写建议或任何拼写纠正功能。只是一个简单的函数,它接受一个字符串并返回一个布尔值。
采纳答案by user225312
Two possible ways of doing it:
两种可能的方法:
- Have your own file which has all the valid words. Load the file into a set and compare each word to see whether it exists in it (word in set)
- (The better way) Use PyEnchant, a spell checking library for Python
- 拥有自己的文件,其中包含所有有效单词。将文件加载到一个集合中,比较每个单词是否存在于其中(word in set)
- (更好的方法)使用PyEnchant,一个 Python 的拼写检查库
PyEnchant is not actively maintained now.
PyEnchant 现在没有得到积极维护。
回答by Surya
回答by Chris Farr
I was looking for the same functionality and struggled to find an existing library that works in Windows, 64 bit. PyEnchant, although a great library, isn't currently active and doesn't work in 64 bit. Other libraries I found didn't work in Windows.
我一直在寻找相同的功能,并努力寻找可在 64 位 Windows 中运行的现有库。PyEnchant 虽然是一个很棒的库,但目前还没有激活,也不能在 64 位上运行。我发现的其他库在 Windows 中不起作用。
I finally found a solution that I hope others will find valuable.
我终于找到了一个解决方案,我希望其他人会觉得有价值。
The solution...
解决方案...
- Use nltk
- Extract the word list from nltk.corpus.brown
- Convert the word list to a set (for efficient searching)
- Use the
inkeyword to determine if your string is in the set
- 使用 nltk
- 从 nltk.corpus.brown 中提取单词列表
- 将单词列表转换为集合(用于高效搜索)
- 使用
in关键字来确定您的字符串是否在集合中
from nltk.corpus import brown
word_list = brown.words()
word_set = set(word_list)
# Check if word is in set
"looked" in word_set # Returns True
"hurrr" in word_set # Returns False
Use a timer check and you'll see this takes virtually no time to search the set. A test on 1,000 words took 0.004 seconds.
使用计时器检查,您会发现这几乎不需要时间来搜索集合。对 1,000 个单词的测试需要 0.004 秒。
回答by krinker
I personally used: http://textblob.readthedocs.io/en/dev/It is an active project and according to the website:
我个人使用:http: //textblob.readthedocs.io/en/dev/这是一个活跃的项目,根据网站:
Spelling correction is based on Peter Norvig's “How to Write a Spelling Corrector”[1] as implemented in the pattern library. It is about 70% accurate
拼写校正基于模式库中实现的 Peter Norvig 的“如何编写拼写校正器”[1]。它的准确率约为 70%

