从单列 Pandas 数据帧生成词云

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43606339/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:28:25  来源:igfitidea点击:

Generate word cloud from single-column Pandas dataframe

pythonpandasdataframeword-cloud

提问by the_bonze

I have a Pandas dataframe with one column: Crime type. The column contains 16 different "categories" of crime, which I would like to visualise as a word cloud, with words sized based on their frequency within the dataframe.

我有一个包含一列的 Pandas 数据框:犯罪类型。该列包含 16 种不同的犯罪“类别”,我想将其可视化为词云,词的大小根据它们在数据框中的频率而定。

enter image description here

在此处输入图片说明

I have attempted to do this with the following code:

我尝试使用以下代码执行此操作:

To bring the data in:

将数据引入:

fields = ['Crime type']

text2 = pd.read_csv('allCrime.csv', usecols=fields)

To generate the word cloud:

生成词云:

wordcloud2 = WordCloud().generate(text2)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()

However, I get this error:

但是,我收到此错误:

TypeError: expected string or bytes-like object

I was able to create an earlier word cloud from the full dataset, using the following code, but I want the word cloud to only generate words from the specific column, 'crime type' ('allCrime.csv' contains approx. 13 columns):

我能够使用以下代码从完整数据集创建一个较早的词云,但我希望词云仅从特定列“犯罪类型”(“allCrime.csv”包含大约 13 列)生成词:

text = open('allCrime.csv').read()
wordcloud = WordCloud().generate(text)
# Generate plot
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

I'm new to Python and Pandas (and coding generally!) so all help is gratefully received.

我是 Python 和 Pandas 的新手(以及一般的编码!)所以非常感谢所有帮助。

回答by languitar

The problem is that the WordCloud.generatemethod that you are using expects a string on which it will count the word instances but your provide a pd.Series.

问题是WordCloud.generate您使用的方法需要一个字符串,它将计算单词实例,但您提供一个pd.Series.

Depending on what you want the word cloud to generate on you can either do:

根据您希望词云生成的内容,您可以执行以下任一操作:

  1. wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type'])), which would concatenate all words in your dataframe column and then count all instances.

  2. Use WordCloud.generate_from_frequenciesto manually pass the computed frequencies of words.

  1. wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type'])),这将连接数据框列中的所有单词,然后计算所有实例。

  2. 使用WordCloud.generate_from_frequencies手动传字的计算的频率。

回答by drorhun

df = pd.read_csv('allCrime.csv', usecols=fields)

text = df['Crime type'].values 

wordcloud = WordCloud().generate(str(text))

plt.imshow(wordcloud)
plt.axis("off")
plt.show()