pandas 如何从熊猫数据帧创建一个词袋

Question

提问by Nabih Ibrahim Bawazir

Here's my dataframe

这是我的数据框

    CATEGORY    BRAND
0   Noodle  Anak Mas
1   Noodle  Anak Mas
2   Noodle  Indomie
3   Noodle  Indomie
4   Noodle  Indomie
23  Noodle  Indomie
24  Noodle  Mi Telor Cap 3
25  Noodle  Mi Telor Cap 3
26  Noodle  Pop Mie
27  Noodle  Pop Mie
...

I already make sure that df type is string, my code is

我已经确定 df 类型是字符串，我的代码是

df = data[['CATEGORY', 'BRAND']].astype(str)
import collections, re
texts = df
bagsofwords = [ collections.Counter(re.findall(r'\w+', txt))
            for txt in texts]
sumbags = sum(bagsofwords, collections.Counter())

When I call

当我打电话

sumbags

The output is

输出是

 Counter({'BRAND': 1, 'CATEGORY': 1})

I want all of the data count in sumbags, except the title, to make it clear something like

我希望 sumbags 中的所有数据计数，除了标题，要清楚一些类似的东西

Counter({'Noodle': 10, 'Indomie': 4, 'Anak': 2, ....}) # because it is bag of words

I need every 1 word counts

我需要每 1 个单词计数

Answer 1

采纳答案by Zero

IIUIC, use

IIUIC，使用

Option 1]Numpy flattenand split

选项 1]Numpyflatten和split

In [2535]: collections.Counter([y for x in df.values.flatten() for y in x.split()])
Out[2535]:
Counter({'3': 2,
         'Anak': 2,
         'Cap': 2,
         'Indomie': 4,
         'Mas': 2,
         'Mi': 2,
         'Mie': 2,
         'Noodle': 10,
         'Pop': 2,
         'Telor': 2})

Option 2]Use value_counts()

选项 2]使用value_counts()

In [2536]: pd.Series([y for x in df.values.flatten() for y in x.split()]).value_counts()
Out[2536]:
Noodle     10
Indomie     4
Mie         2
Pop         2
Anak        2
Mi          2
Cap         2
Telor       2
Mas         2
3           2
dtype: int64

Options 3]Use stackand value_counts

选项 3]使用stack和value_counts

In [2582]: df.apply(lambda x: x.str.split(expand=True).stack()).stack().value_counts()
Out[2582]:
Noodle     10
Indomie     4
Mie         2
Pop         2
Anak        2
Mi          2
Cap         2
Telor       2
Mas         2
3           2
dtype: int64

Details

细节

In [2516]: df
Out[2516]:
   CATEGORY           BRAND
0    Noodle        Anak Mas
1    Noodle        Anak Mas
2    Noodle         Indomie
3    Noodle         Indomie
4    Noodle         Indomie
23   Noodle         Indomie
24   Noodle  Mi Telor Cap 3
25   Noodle  Mi Telor Cap 3
26   Noodle         Pop Mie
27   Noodle         Pop Mie

pandas 如何从熊猫数据帧创建一个词袋

提问by Nabih Ibrahim Bawazir

采纳答案by Zero

相关推荐

最近更新

标签

pandas 如何从熊猫数据帧创建一个词袋

提问by Nabih Ibrahim Bawazir

采纳答案by Zero

相关推荐

pandas “索引”对象在python中不可调用

pandas 迭代数据帧中的组

pandas 如何使用 Python 从数据框中的每个字符串中获取第一个单词？

从 Pandas 数据框中删除 NaN 值并重塑表

相关推荐

最近更新

标签