Python re.sub 错误与“预期的字符串或类似字节的对象”

Question

提问by imanexcelnoob

I have read multiple posts regarding this error, but I still can't figure it out. When I try to loop through my function:

我已经阅读了多篇关于此错误的帖子，但我仍然无法弄清楚。当我尝试遍历我的函数时：

def fix_Plan(location):
    letters_only = re.sub("[^a-zA-Z]",  # Search for all non-letters
                          " ",          # Replace all non-letters with spaces
                          location)     # Column and row to search    

    words = letters_only.lower().split()     
    stops = set(stopwords.words("english"))      
    meaningful_words = [w for w in words if not w in stops]      
    return (" ".join(meaningful_words))    

col_Plan = fix_Plan(train["Plan"][0])    
num_responses = train["Plan"].size    
clean_Plan_responses = []

for i in range(0,num_responses):
    clean_Plan_responses.append(fix_Plan(train["Plan"][i]))

Here is the error:

这是错误：

Traceback (most recent call last):
  File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 48, in <module>
    clean_Plan_responses.append(fix_Plan(train["Plan"][i]))
  File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 22, in fix_Plan
    location)  # Column and row to search
  File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36\lib\re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

Answer 1

回答by abccd

As you stated in the comments, some of the values appeared to be floats, not strings. You will need to change it to strings before passing it to re.sub. The simplest way is to change locationto str(location)when using re.sub. It wouldn't hurt to do it anyways even if it's already a str.

正如您在评论中所述，某些值似乎是浮点数，而不是字符串。您需要将其更改为字符串，然后再将其传递给re.sub. 最简单的方法是在使用时location改为。即使它已经是一个.str(location)re.substr

letters_only = re.sub("[^a-zA-Z]",  # Search for all non-letters
                          " ",          # Replace all non-letters with spaces
                          str(location))

Answer 2

回答by msaif

the simplest solution is to apply python str function to the column you are trying to loop through.

最简单的解决方案是将 python str 函数应用于您尝试循环的列。

if you are using pandas this can be implemented as

如果您使用的是熊猫，这可以实现为

dataframe['column_name']=dataframe['column_name'].apply(str)

Answer 3

回答by Bilal Chandio

I suppose better would be to use re.match() function. here is an example which may help you.

我想最好是使用 re.match() 函数。这是一个可以帮助您的示例。

import re
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')
sentences = word_tokenize("I love to learn NLP \n 'a :(")
#for i in range(len(sentences)):
sentences = [word.lower() for word in sentences if re.match('^[a-zA-Z]+', word)]  
sentences

Python re.sub 错误与“预期的字符串或类似字节的对象”

提问by imanexcelnoob

回答by abccd

回答by msaif

回答by Bilal Chandio

相关推荐

最近更新

标签

Python re.sub 错误与“预期的字符串或类似字节的对象”

提问by imanexcelnoob

回答by abccd

回答by msaif

回答by Bilal Chandio

相关推荐

Python Django 模型“未声明显式 app_label”

Python 如何在 Airflow 中创建条件任务

OpenCV 无法在带有 anaconda 的 Linux 上与 python 正常工作。获取未实现 cv2.imshow() 的错误

Python 用乌龟画字母

相关推荐

最近更新

标签