从单列 Pandas 数据帧生成词云

Question

提问by the_bonze

I have a Pandas dataframe with one column: Crime type. The column contains 16 different "categories" of crime, which I would like to visualise as a word cloud, with words sized based on their frequency within the dataframe.

我有一个包含一列的 Pandas 数据框：犯罪类型。该列包含 16 种不同的犯罪“类别”，我想将其可视化为词云，词的大小根据它们在数据框中的频率而定。

I have attempted to do this with the following code:

我尝试使用以下代码执行此操作：

To bring the data in:

将数据引入：

fields = ['Crime type']

text2 = pd.read_csv('allCrime.csv', usecols=fields)

To generate the word cloud:

生成词云：

wordcloud2 = WordCloud().generate(text2)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()

However, I get this error:

但是，我收到此错误：

TypeError: expected string or bytes-like object

I was able to create an earlier word cloud from the full dataset, using the following code, but I want the word cloud to only generate words from the specific column, 'crime type' ('allCrime.csv' contains approx. 13 columns):

我能够使用以下代码从完整数据集创建一个较早的词云，但我希望词云仅从特定列“犯罪类型”（“allCrime.csv”包含大约 13 列）生成词：

text = open('allCrime.csv').read()
wordcloud = WordCloud().generate(text)
# Generate plot
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

I'm new to Python and Pandas (and coding generally!) so all help is gratefully received.

我是 Python 和 Pandas 的新手（以及一般的编码！）所以非常感谢所有帮助。

Answer 1

回答by languitar

The problem is that the WordCloud.generatemethod that you are using expects a string on which it will count the word instances but your provide a pd.Series.

问题是WordCloud.generate您使用的方法需要一个字符串，它将计算单词实例，但您提供一个pd.Series.

Depending on what you want the word cloud to generate on you can either do:

根据您希望词云生成的内容，您可以执行以下任一操作：

wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type'])), which would concatenate all words in your dataframe column and then count all instances.
Use WordCloud.generate_from_frequenciesto manually pass the computed frequencies of words.

wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type']))，这将连接数据框列中的所有单词，然后计算所有实例。
使用WordCloud.generate_from_frequencies手动传字的计算的频率。

Answer 2

回答by drorhun

df = pd.read_csv('allCrime.csv', usecols=fields)

text = df['Crime type'].values 

wordcloud = WordCloud().generate(str(text))

plt.imshow(wordcloud)
plt.axis("off")
plt.show()

从单列 Pandas 数据帧生成词云

提问by the_bonze

回答by languitar

回答by drorhun

相关推荐

最近更新

标签

从单列 Pandas 数据帧生成词云

提问by the_bonze

回答by languitar

回答by drorhun

相关推荐

pandas 如何将数据框转换为一维数组？

pandas 为数据框的每一行应用 textblob

Pandas 以索引列为条件

Pandas pd.cut() - 合并日期时间列/系列

相关推荐

最近更新

标签