替换字符串python中的特殊字符

Question

提问by user2363217

I am using urllib to get a string of html from a website and need to put each word in the html document into a list.

我正在使用 urllib 从网站获取一串 html，需要将 html 文档中的每个单词放入一个列表中。

Here is the code I have so far. I keep getting an error. I have also copied the error below.

这是我到目前为止的代码。我不断收到错误消息。我也复制了下面的错误。

import urllib.request

url = input("Please enter a URL: ")

z=urllib.request.urlopen(url)
z=str(z.read())
removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")

words = removeSpecialChars.split()

print ("Words list: ", words[0:20])

Here is the error.

这是错误。

Please enter a URL: http://simleyfootball.com
Traceback (most recent call last):
  File "C:\Users\jeremy.KLUG\My Documents\LiClipse Workspace\Python Project 2\Module2.py", line 7, in <module>
    removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")
TypeError: replace() takes at least 2 arguments (1 given)

Answer 1

采纳答案by rassahah

str.replace is the wrong function for what you want to do (apart from it being used incorrectly). You want to replace any character of a set with a space, not the whole set with a single space (the latter is what replace does). You can use translate like this:

str.replace 是您想要做的错误功能（除了它被错误使用）。您想用空格替换集合中的任何字符，而不是用单个空格替换整个集合（后者就是 replace 所做的）。您可以像这样使用翻译：

removeSpecialChars = z.translate ({ord(c): " " for c in "!@#$%^&*()[]{};:,./<>?\|`~-=_+"})

This creates a mapping which maps every character in your list of special characters to a space, then calls translate() on the string, replacing every single character in the set of special characters with a space.

这将创建一个映射，将特殊字符列表中的每个字符映射到一个空格，然后在字符串上调用 translate()，用空格替换一组特殊字符中的每个单个字符。

Answer 2

回答by Pavel

replace operates on a specific string, so you need to call it like this

替换对特定字符串进行操作，因此您需要像这样调用它

removeSpecialChars = z.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")

but this is probably not what you need, since this will look for a single string containing all that characters in the same order. you can do it with a regexp, as Danny Michaud pointed out.

但这可能不是您需要的，因为这将查找包含所有相同顺序的字符的单个字符串。正如 Danny Michaud 指出的那样，您可以使用正则表达式来做到这一点。

as a side note, you might want to look for BeautifulSoup, which is a library for parsing messy HTML formatted text like what you usually get from scaping websites.

作为旁注，您可能想要寻找BeautifulSoup，它是一个用于解析凌乱的 HTML 格式文本的库，例如您通常从网站上获得的内容。

Answer 3

回答by Danny M

You need to call replaceon zand not on str, since you want to replace characters located in the string variable z

您需要调用replaceonz而不是 on str，因为您要替换字符串变量中的字符z

removeSpecialChars = z.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")

But this will not work, as replace looks for a substring, you will most likely need to use regular expression module rewith the subfunction:

但这不起作用，因为替换查找子字符串，您很可能需要将正则表达式模块re与sub函数一起使用：

import re
removeSpecialChars = re.sub("[!@#$%^&*()[]{};:,./<>?\|`~-=_+]", " ", z)

Don't forget the [], which indicates that this is a set of characters to be replaced.

不要忘记[]，它表示这是一组要替换的字符。

Answer 4

回答by Kobi K

One way is to use re.sub, that's my preferred way.

一种方法是使用re.sub，这是我的首选方式。

import re
my_str = "hey th~!ere"
my_new_string = re.sub('[^a-zA-Z0-9 \n\.]', '', my_str)
print my_new_string

Output:

输出：

hey there

Another way is to use re.escape:

另一种方法是使用re.escape：

import string
import re

my_str = "hey th~!ere"

chars = re.escape(string.punctuation)
print re.sub(r'['+chars+']', '',my_str)

Output:

输出：

hey there

Just a small tipabout parameters style in python by PEP-8parameters should be remove_special_charsand not removeSpecialChars

只是一个关于PEP-8参数在 python 中的参数样式的小提示应该是remove_special_chars而不是removeSpecialChars

Also if you want to keepthe spaces just change [^a-zA-Z0-9 \n\.]to [^a-zA-Z0-9\n\.]

另外，如果您想保留空格，只需更改[^a-zA-Z0-9 \n\.]为[^a-zA-Z0-9\n\.]

Answer 5

回答by surendran

You can replace the special characters with the desired characters as follows,

您可以用所需的字符替换特殊字符，如下所示，

import string
specialCharacterText = "H#y #@w @re &*)?"
inCharSet = "!@#$%^&*()[]{};:,./<>?\|`~-=_+\""
outCharSet = "                               " #corresponding characters in inCharSet to be replaced
splCharReplaceList = string.maketrans(inCharSet, outCharSet)
splCharFreeString = specialCharacterText.translate(splCharReplaceList)

替换字符串python中的特殊字符

提问by user2363217

采纳答案by rassahah

回答by Pavel

回答by Danny M

回答by Kobi K

回答by surendran

相关推荐

最近更新

标签

替换字符串python中的特殊字符

提问by user2363217

采纳答案by rassahah

回答by Pavel

回答by Danny M

回答by Kobi K

回答by surendran

相关推荐

如何使用python将excel数据读取到数组中

Python 类型错误：method() 需要 1 个位置参数，但给出了 2 个

Python 无法查看 Pandas 数据框中的所有列

如何在python中将int转换为Enum？

相关推荐

最近更新

标签