Python 类型错误:预期的字符串或类似字节的对象熊猫变量
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39469711/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
TypeError: expected string or bytes-like object pandas variable
提问by Edward
I have dataset like this
我有这样的数据集
import pandas as pd
df = pd.DataFrame({'word': ['abs e learning ', 'abs e-learning', 'abs e&learning', 'abs elearning']})
I want to get
我想得到
word
0 abs elearning
1 abs elearning
2 abs elearning
3 abs elearning
I do as bellow
我做如下
re_map = {r'\be learning\b': 'elearning', r'\be-learning\b': 'elearning', r'\be&learning\b': 'elearning'}
import re
for r, map in re_map.items():
df['word'] = re.sub(r, map, df['word'])
and error
和错误
TypeError Traceback (most recent call last)
<ipython-input-42-fbf00d9a0cba> in <module>()
3 s = df['word']
4 for r, map in re_map.items():
----> 5 df['word'] = re.sub(r, map, df['word'])
C:\Users\Edward\Anaconda3\lib\re.py in sub(pattern, repl, string, count, flags)
180 a callable, it's passed the match object and must return
181 a replacement string to be used."""
--> 182 return _compile(pattern, flags).sub(repl, string, count)
183
184 def subn(pattern, repl, string, count=0, flags=0):
TypeError: expected string or bytes-like object
I can apply str like this
我可以像这样应用 str
for r, map in re_map.items():
df['word'] = re.sub(r, map, str(df['word']))
There is no mistake but i cann't get pd.dataFrame as i wish
没有错误,但我无法如我所愿地获得 pd.dataFrame
word
0 0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...
1 0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...
2 0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...
3 0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...
how to improve it?
如何改进呢?
回答by Jean-Fran?ois Fabre
df['word']
is a list. Converting to string just destroys your list.
df['word']
是一个列表。转换为字符串只会破坏您的列表。
You need to apply regex on each member:
您需要对每个成员应用正则表达式:
for r, map in re_map.items():
df['word'] = [re.sub(r, map, e) for e in df['word']]:
classical alternate method without list comprehension:
没有列表理解的经典替代方法:
for r, map in re_map.items():
d = df['word']
for i,e in enumerate(d):
d[i] = re.sub(r, map, e)
BTW you could simplify your regex list drastically:
顺便说一句,您可以大大简化您的正则表达式列表:
re_map = {r'\be[\-& ]learning\b': 'elearning'}
By doing that you only have one regex and this becomes a one-liner:
通过这样做,你只有一个正则表达式,这就变成了一个单行:
df['word'] = [re.sub(r'\be[\-& ]learning\b', 'elearning', e) for e in df['word']]:
could even be faster by pre-compiling the regex once for all substitutions:
通过为所有替换预编译一次正则表达式甚至可以更快:
theregex = re.compile(r'\be[\-& ]learning\b')
df['word'] = [theregex.sub('elearning', e) for e in df['word']]: