Python re.sub 错误与“预期的字符串或类似字节的对象”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43727583/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
re.sub erroring with "Expected string or bytes-like object"
提问by imanexcelnoob
I have read multiple posts regarding this error, but I still can't figure it out. When I try to loop through my function:
我已经阅读了多篇关于此错误的帖子,但我仍然无法弄清楚。当我尝试遍历我的函数时:
def fix_Plan(location):
letters_only = re.sub("[^a-zA-Z]", # Search for all non-letters
" ", # Replace all non-letters with spaces
location) # Column and row to search
words = letters_only.lower().split()
stops = set(stopwords.words("english"))
meaningful_words = [w for w in words if not w in stops]
return (" ".join(meaningful_words))
col_Plan = fix_Plan(train["Plan"][0])
num_responses = train["Plan"].size
clean_Plan_responses = []
for i in range(0,num_responses):
clean_Plan_responses.append(fix_Plan(train["Plan"][i]))
Here is the error:
这是错误:
Traceback (most recent call last):
File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 48, in <module>
clean_Plan_responses.append(fix_Plan(train["Plan"][i]))
File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 22, in fix_Plan
location) # Column and row to search
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36\lib\re.py", line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
回答by abccd
As you stated in the comments, some of the values appeared to be floats, not strings. You will need to change it to strings before passing it to re.sub
. The simplest way is to change location
to str(location)
when using re.sub
. It wouldn't hurt to do it anyways even if it's already a str
.
正如您在评论中所述,某些值似乎是浮点数,而不是字符串。您需要将其更改为字符串,然后再将其传递给re.sub
. 最简单的方法是在使用时location
改为。即使它已经是一个.str(location)
re.sub
str
letters_only = re.sub("[^a-zA-Z]", # Search for all non-letters
" ", # Replace all non-letters with spaces
str(location))
回答by msaif
the simplest solution is to apply python str function to the column you are trying to loop through.
最简单的解决方案是将 python str 函数应用于您尝试循环的列。
if you are using pandas this can be implemented as
如果您使用的是熊猫,这可以实现为
dataframe['column_name']=dataframe['column_name'].apply(str)
dataframe['column_name']=dataframe['column_name'].apply(str)
回答by Bilal Chandio
I suppose better would be to use re.match() function. here is an example which may help you.
我想最好是使用 re.match() 函数。这是一个可以帮助您的示例。
import re
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')
sentences = word_tokenize("I love to learn NLP \n 'a :(")
#for i in range(len(sentences)):
sentences = [word.lower() for word in sentences if re.match('^[a-zA-Z]+', word)]
sentences