使用 NLTK 和 Pandas 去除停用词

Question

提问by slm

I have some issues with Pandas and NLTK. I am new at programming, so excuse me if i ask questions that might be easy to solve. I have a csv file which has 3 columns(Id,Title,Body) and about 15.000 rows.

我对 Pandas 和 NLTK 有一些问题。我是编程新手，所以如果我问的问题可能很容易解决，请原谅。我有一个 csv 文件，它有 3 列（Id、Title、Body）和大约 15.000 行。

My goal is to remove the stopwords from this csv file. The operation for lowercase and split are working well. But i can not find my mistake why the stopwords does not get removed. What am i missing?

我的目标是从这个 csv 文件中删除停用词。小写和拆分操作运行良好。但我找不到我的错误为什么停用词没有被删除。我错过了什么？

    import pandas as pd
    from nltk.corpus import stopwords

    pd.read_csv("test10in.csv", encoding="utf-8") 

    df = pd.read_csv("test10in.csv") 

    df.columns = ['Id','Title','Body']
    df['Title'] = df['Title'].str.lower().str.split()  
    df['Body'] = df['Body'].str.lower().str.split() 


    stop = stopwords.words('english')

    df['Title'].apply(lambda x: [item for item in x if item not in stop])
    df['Body'].apply(lambda x: [item for item in x if item not in stop])

    df.to_csv("test10out.csv")

Answer 1

回答by AbtPst

you are trying to do an inplace replace. you should do

您正在尝试进行就地替换。你应该做

   df['Title'] = df['Title'].apply(lambda x: [item for item in x if item not in stop])
    df['Body'] = df['Body'].apply(lambda x: [item for item in x if item not in stop])

Answer 2

回答by 176coding

df.replace(stop,regex=True,inplace=True)

使用 NLTK 和 Pandas 去除停用词

提问by slm

回答by AbtPst

回答by 176coding

相关推荐

最近更新

标签

使用 NLTK 和 Pandas 去除停用词

提问by slm

回答by AbtPst

回答by 176coding

相关推荐

Pandas Lambda 函数：属性错误“发生在索引 0”

pandas 熊猫“.convert_objects(convert_numeric=True)”已弃用

pandas 如何将列表转换为熊猫中的集合？

pandas 从数据框熊猫创建多索引

相关推荐

最近更新

标签