如何在 Python 中的数据帧中的每一行上使用 split 函数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36108377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:25:22  来源:igfitidea点击:

How to use the split function on every row in a dataframe in Python?

pythonstringdataframe

提问by goutam

I want to count the number of times a word is being repeated in the review string

我想计算一个单词在评论字符串中重复的次数

I am reading the csv file and storing it in a python dataframe using the below line

我正在读取 csv 文件并使用以下行将其存储在 python 数据框中

reviews = pd.read_csv("amazon_baby.csv")

The code in the below lines work when I apply it to a single review.

当我将其应用于单个评论时,下面几行中的代码有效。

print reviews["review"][1]
a = reviews["review"][1].split("disappointed")
print a
b = len(a)
print b

The output for the above lines were

上述行的输出是

it came early and was not disappointed. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.
['it came early and was not ', '. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.']
2

When I apply the same logic to the entire dataframe using the below line. I receive an error message

当我使用以下行将相同的逻辑应用于整个数据框时。我收到一条错误消息

reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1

Error message:

错误信息:

Traceback (most recent call last):
  File "C:/Users/gouta/PycharmProjects/MLCourse1/Classifier.py", line 12, in <module>
    reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1
  File "C:\Users\gouta\Anaconda2\lib\site-packages\pandas\core\generic.py", line 2360, in __getattr__
    (type(self).__name__, name))
AttributeError: 'Series' object has no attribute 'split'

回答by hoyland

You're trying to split the entire review column of the data frame (which is the Series mentioned in the error message). What you want to do is apply a function to each row of the data frame, which you can do by calling applyon the data frame:

您正在尝试拆分数据框的整个评论列(这是错误消息中提到的系列)。您想要做的是对数据框的每一行应用一个函数,您可以通过在数据框上调用apply来实现:

f = lambda x: len(x["review"].split("disappointed")) -1
reviews["disappointed"] = reviews.apply(f, axis=1)

回答by Austin

pandas 0.20.3 has pandas.Series.str.split()which acts on every string of the series and does the split. So you can simply split and then count the number of splits made

pandas 0.20.3 有pandas.Series.str.split()作用于系列的每个字符串并进行拆分。所以你可以简单地拆分然后计算拆分的数量

len(reviews['review'].str.split('disappointed')) - 1

pandas.Series.str.split

pandas.Series.str.split

回答by Hossain Muctadir

Well the problem is,

那么问题是,

reviews["review"]

is a Series. In your first snippet you are doing this,

是一个系列。在你的第一个片段中,你正在这样做,

reviews["review"][1].split("disappointed")

Putting an index for the review. You could try looping over all rows of the column and perform your desired action. For example,

为建立索引。您可以尝试遍历列的所有行并执行所需的操作。例如,

for index, row in reviews.iterrows():
    print len(row['review'].split("disappointed"))

回答by Stop harming Monica

You can use .strto use string methods on series of strings:

您可以使用.str对一系列字符串使用字符串方法:

reviews["review"].str.split("disappointed")