Python 类型错误:预期的字符串或类似字节的对象熊猫变量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39469711/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:18:31  来源:igfitidea点击:

TypeError: expected string or bytes-like object pandas variable

pythonregex

提问by Edward

I have dataset like this

我有这样的数据集

import pandas as pd
df = pd.DataFrame({'word': ['abs e learning ', 'abs e-learning', 'abs e&learning', 'abs elearning']})

I want to get

我想得到

      word
0   abs elearning
1   abs elearning
2   abs elearning
3   abs elearning

I do as bellow

我做如下

re_map = {r'\be learning\b': 'elearning', r'\be-learning\b': 'elearning', r'\be&learning\b': 'elearning'}
import re
for r, map in re_map.items():
            df['word'] = re.sub(r, map, df['word'])

and error

和错误

TypeError                                 Traceback (most recent call last)
<ipython-input-42-fbf00d9a0cba> in <module>()
      3 s = df['word']
      4 for r, map in re_map.items():
----> 5             df['word'] = re.sub(r, map, df['word'])

C:\Users\Edward\Anaconda3\lib\re.py in sub(pattern, repl, string, count, flags)
    180     a callable, it's passed the match object and must return
    181     a replacement string to be used."""
--> 182     return _compile(pattern, flags).sub(repl, string, count)
    183 
    184 def subn(pattern, repl, string, count=0, flags=0):

TypeError: expected string or bytes-like object

I can apply str like this

我可以像这样应用 str

for r, map in re_map.items():
            df['word'] = re.sub(r, map, str(df['word']))

There is no mistake but i cann't get pd.dataFrame as i wish

没有错误,但我无法如我所愿地获得 pd.dataFrame

    word
0   0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...
1   0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...
2   0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...
3   0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...

how to improve it?

如何改进呢?

回答by Jean-Fran?ois Fabre

df['word']is a list. Converting to string just destroys your list.

df['word']是一个列表。转换为字符串只会破坏您的列表。

You need to apply regex on each member:

您需要对每个成员应用正则表达式:

for r, map in re_map.items():
    df['word'] = [re.sub(r, map, e) for e in df['word']]:

classical alternate method without list comprehension:

没有列表理解的经典替代方法:

 for r, map in re_map.items():
     d = df['word']
     for i,e in enumerate(d):
         d[i] = re.sub(r, map, e)

BTW you could simplify your regex list drastically:

顺便说一句,您可以大大简化您的正则表达式列表:

re_map = {r'\be[\-& ]learning\b': 'elearning'}

By doing that you only have one regex and this becomes a one-liner:

通过这样做,你只有一个正则表达式,这就变成了一个单行:

 df['word'] = [re.sub(r'\be[\-& ]learning\b', 'elearning', e) for e in df['word']]:

could even be faster by pre-compiling the regex once for all substitutions:

通过为所有替换预编译一次正则表达式甚至可以更快:

 theregex = re.compile(r'\be[\-& ]learning\b')
 df['word'] = [theregex.sub('elearning', e) for e in df['word']]: