从单列 Pandas 数据帧生成词云
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43606339/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Generate word cloud from single-column Pandas dataframe
提问by the_bonze
I have a Pandas dataframe with one column: Crime type. The column contains 16 different "categories" of crime, which I would like to visualise as a word cloud, with words sized based on their frequency within the dataframe.
我有一个包含一列的 Pandas 数据框:犯罪类型。该列包含 16 种不同的犯罪“类别”,我想将其可视化为词云,词的大小根据它们在数据框中的频率而定。
I have attempted to do this with the following code:
我尝试使用以下代码执行此操作:
To bring the data in:
将数据引入:
fields = ['Crime type']
text2 = pd.read_csv('allCrime.csv', usecols=fields)
To generate the word cloud:
生成词云:
wordcloud2 = WordCloud().generate(text2)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()
However, I get this error:
但是,我收到此错误:
TypeError: expected string or bytes-like object
I was able to create an earlier word cloud from the full dataset, using the following code, but I want the word cloud to only generate words from the specific column, 'crime type' ('allCrime.csv' contains approx. 13 columns):
我能够使用以下代码从完整数据集创建一个较早的词云,但我希望词云仅从特定列“犯罪类型”(“allCrime.csv”包含大约 13 列)生成词:
text = open('allCrime.csv').read()
wordcloud = WordCloud().generate(text)
# Generate plot
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
I'm new to Python and Pandas (and coding generally!) so all help is gratefully received.
我是 Python 和 Pandas 的新手(以及一般的编码!)所以非常感谢所有帮助。
回答by languitar
The problem is that the WordCloud.generate
method that you are using expects a string on which it will count the word instances but your provide a pd.Series
.
问题是WordCloud.generate
您使用的方法需要一个字符串,它将计算单词实例,但您提供一个pd.Series
.
Depending on what you want the word cloud to generate on you can either do:
根据您希望词云生成的内容,您可以执行以下任一操作:
wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type']))
, which would concatenate all words in your dataframe column and then count all instances.Use
WordCloud.generate_from_frequencies
to manually pass the computed frequencies of words.
wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type']))
,这将连接数据框列中的所有单词,然后计算所有实例。使用
WordCloud.generate_from_frequencies
手动传字的计算的频率。
回答by drorhun
df = pd.read_csv('allCrime.csv', usecols=fields)
text = df['Crime type'].values
wordcloud = WordCloud().generate(str(text))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()