替换字符串python中的特殊字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23996118/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
replace special characters in a string python
提问by user2363217
I am using urllib to get a string of html from a website and need to put each word in the html document into a list.
我正在使用 urllib 从网站获取一串 html,需要将 html 文档中的每个单词放入一个列表中。
Here is the code I have so far. I keep getting an error. I have also copied the error below.
这是我到目前为止的代码。我不断收到错误消息。我也复制了下面的错误。
import urllib.request
url = input("Please enter a URL: ")
z=urllib.request.urlopen(url)
z=str(z.read())
removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")
words = removeSpecialChars.split()
print ("Words list: ", words[0:20])
Here is the error.
这是错误。
Please enter a URL: http://simleyfootball.com
Traceback (most recent call last):
File "C:\Users\jeremy.KLUG\My Documents\LiClipse Workspace\Python Project 2\Module2.py", line 7, in <module>
removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")
TypeError: replace() takes at least 2 arguments (1 given)
采纳答案by rassahah
str.replace is the wrong function for what you want to do (apart from it being used incorrectly). You want to replace any character of a set with a space, not the whole set with a single space (the latter is what replace does). You can use translate like this:
str.replace 是您想要做的错误功能(除了它被错误使用)。您想用空格替换集合中的任何字符,而不是用单个空格替换整个集合(后者就是 replace 所做的)。您可以像这样使用翻译:
removeSpecialChars = z.translate ({ord(c): " " for c in "!@#$%^&*()[]{};:,./<>?\|`~-=_+"})
This creates a mapping which maps every character in your list of special characters to a space, then calls translate() on the string, replacing every single character in the set of special characters with a space.
这将创建一个映射,将特殊字符列表中的每个字符映射到一个空格,然后在字符串上调用 translate(),用空格替换一组特殊字符中的每个单个字符。
回答by Pavel
replace operates on a specific string, so you need to call it like this
替换对特定字符串进行操作,因此您需要像这样调用它
removeSpecialChars = z.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")
but this is probably not what you need, since this will look for a single string containing all that characters in the same order. you can do it with a regexp, as Danny Michaud pointed out.
但这可能不是您需要的,因为这将查找包含所有相同顺序的字符的单个字符串。正如 Danny Michaud 指出的那样,您可以使用正则表达式来做到这一点。
as a side note, you might want to look for BeautifulSoup, which is a library for parsing messy HTML formatted text like what you usually get from scaping websites.
作为旁注,您可能想要寻找BeautifulSoup,它是一个用于解析凌乱的 HTML 格式文本的库,例如您通常从网站上获得的内容。
回答by Danny M
You need to call replace
on z
and not on str
, since you want to replace characters located in the string variable z
您需要调用replace
onz
而不是 on str
,因为您要替换字符串变量中的字符z
removeSpecialChars = z.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")
But this will not work, as replace looks for a substring, you will most likely need to use regular expression module re
with the sub
function:
但这不起作用,因为替换查找子字符串,您很可能需要将正则表达式模块re
与sub
函数一起使用:
import re
removeSpecialChars = re.sub("[!@#$%^&*()[]{};:,./<>?\|`~-=_+]", " ", z)
Don't forget the []
, which indicates that this is a set of characters to be replaced.
不要忘记[]
,它表示这是一组要替换的字符。
回答by Kobi K
One way is to use re.sub, that's my preferred way.
一种方法是使用re.sub,这是我的首选方式。
import re
my_str = "hey th~!ere"
my_new_string = re.sub('[^a-zA-Z0-9 \n\.]', '', my_str)
print my_new_string
Output:
输出:
hey there
Another way is to use re.escape:
另一种方法是使用re.escape:
import string
import re
my_str = "hey th~!ere"
chars = re.escape(string.punctuation)
print re.sub(r'['+chars+']', '',my_str)
Output:
输出:
hey there
Just a small tipabout parameters style in python by PEP-8parameters should be remove_special_chars
and not removeSpecialChars
只是一个关于PEP-8参数在 python 中的参数样式的小提示应该是remove_special_chars
而不是removeSpecialChars
Also if you want to keepthe spaces just change [^a-zA-Z0-9 \n\.]
to [^a-zA-Z0-9\n\.]
另外,如果您想保留空格,只需更改[^a-zA-Z0-9 \n\.]
为[^a-zA-Z0-9\n\.]
回答by surendran
You can replace the special characters with the desired characters as follows,
您可以用所需的字符替换特殊字符,如下所示,
import string
specialCharacterText = "H#y #@w @re &*)?"
inCharSet = "!@#$%^&*()[]{};:,./<>?\|`~-=_+\""
outCharSet = " " #corresponding characters in inCharSet to be replaced
splCharReplaceList = string.maketrans(inCharSet, outCharSet)
splCharFreeString = specialCharacterText.translate(splCharReplaceList)