如何从 Python 中的语料库创建词云?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16645799/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:15:48  来源:igfitidea点击:

How to create a word cloud from a corpus in Python?

pythonnltkcorpusgensimword-cloud

提问by alvas

From Creating a subset of words from a corpus in R, the answerer can easily convert a term-document matrixinto a word cloud easily.

通过从R 中的语料库创建单词子集,回答者可以轻松地将 a 轻松转换term-document matrix为词云。

Is there a similar function from python libraries that takes either a raw word textfile or NLTKcorpus or GensimMmcorpus into a word cloud?

python 库中是否有类似的函数可以将原始单词文本文件或NLTK语料库或GensimMmcorpus 转换为词云?

The result will look somewhat like this: enter image description here

结果看起来有点像这样: 在此处输入图片说明

采纳答案by Marcin

回答by valentinos

In case you require these word clouds for showing them in website or web app you can convert your data to json or csv format and load it to a JavaScript visualisation library such as d3. Word Clouds on d3

如果您需要这些词云在网站或 Web 应用程序中显示它们,您可以将数据转换为 json 或 csv 格式并将其加载到 JavaScript 可视化库,例如d3d3 上的词云

If not, Marcin's answer is a good way for doing what you describe.

如果没有,Marcin 的回答是执行您所描述的操作的好方法。

回答by MyopicVisage

Example of amueller's code in action

amueller 代码示例

In command-line / terminal:

在命令行/终端中:

sudo pip install wordcloud

Then run python script:

然后运行python脚本:

## Simple WordCloud
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS 

text = 'all your base are belong to us all of your base base base'

def generate_wordcloud(text): # optionally add: stopwords=STOPWORDS and change the arg below
    wordcloud = WordCloud(font_path='/Library/Fonts/Verdana.ttf',
                          width=800, heigh=400,
                          relative_scaling = 1.0,
                          stopwords = {'to', 'of'} # set or space-separated string
                          ).generate(text)

    fig = plt.figure(1, figsize=(8, 4))
    plt.axis('off')
    plt.imshow(wordcloud)
    plt.axis("off")
    ## Pick One:
    # plt.show()
    plt.savefig("WordCloud.png")

generate_wordcloud(text)

enter image description here

在此处输入图片说明

回答by HeadAndTail

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
stopwords = set(STOPWORDS)

def show_wordcloud(data, title = None):
    wordcloud = WordCloud(
        background_color='white',
        stopwords=stopwords,
        max_words=200,
        max_font_size=40, 
        scale=3,
        random_state=1 # chosen at random by flipping a coin; it was heads
    ).generate(str(data))

    fig = plt.figure(1, figsize=(12, 12))
    plt.axis('off')
    if title: 
        fig.suptitle(title, fontsize=20)
        fig.subplots_adjust(top=2.3)

    plt.imshow(wordcloud)
    plt.show()

show_wordcloud(Samsung_Reviews_Negative['Reviews'])
show_wordcloud(Samsung_Reviews_positive['Reviews'])

enter image description here

在此处输入图片说明

回答by Ujjawal107

here is the short code

这是短代码

#make wordcoud

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
stopwords = set(STOPWORDS)

def show_wordcloud(data, title = None):
    wordcloud = WordCloud(
        background_color='white',
        stopwords=stopwords,
        max_words=200,
        max_font_size=40, 
        scale=3,
        random_state=1 # chosen at random by flipping a coin; it was heads
    ).generate(str(data))

    fig = plt.figure(1, figsize=(12, 12))
    plt.axis('off')
    if title: 
        fig.suptitle(title, fontsize=20)
        fig.subplots_adjust(top=2.3)

    plt.imshow(wordcloud)
    plt.show()


if __name__ == '__main__':

    show_wordcloud(text_str)