如何使用 Pandas/Python 删除括号和所有数据?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20894525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove parentheses and all data within using Pandas/Python?
提问by Alexis
I have a dataframe where I want to remove all parentheses and stuff inside it.
我有一个数据框,我想删除其中的所有括号和内容。
I checked out : How can I remove text within parentheses with a regex?
我签出: 如何使用正则表达式删除括号内的文本?
Where the answer to remove the data was
删除数据的答案在哪里
re.sub(r'\([^)]*\)', '', filename)
I tried this as well as
我也试过这个
re.sub(r'\(.*?\)', '', filename)
However, I got an error: expected a string or buffer
但是,我收到了一个错误: expected a string or buffer
When I tried using the column df['Column Name']I got no item named 'Column Name'
当我尝试使用df['Column Name']我得到的列时no item named 'Column Name'
I checked the dataframe using df.head()and it showed up as a clean table with the column names as what I wanted them to be....however when I use the reexpression to remove the (stuff) it isn't recognizing the column name that I have.
我使用检查了数据框df.head(),它显示为一个干净的表,列名是我想要的......但是当我使用re表达式删除(东西)时,它无法识别我的列名有。
I normally use
我通常使用
df['name'].str.replace(" ()","")
However, I want to remove the parentheses and what is inside....How can I do this using either regex or pandas?
但是,我想删除括号和里面的内容......我如何使用正则表达式或熊猫来做到这一点?
Thanks!
谢谢!
Here is the solution I used...thanks for the help!
这是我使用的解决方案...感谢您的帮助!
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*\)","")
回答by dmvianna
df['name'].str.replace(r"\(.*\)","")
You can't run refunctions directly on pandas objects. You have to loop them for each element inside the object. So Series.str.replace((r"\(.*\)", "")is just syntactic sugar for Series.apply(lambda x: re.sub(r"\(.*\)", "", x)).
您不能直接在 pandas 对象上运行re函数。您必须为对象内的每个元素循环它们。所以Series.str.replace((r"\(.*\)", "")是只是语法糖Series.apply(lambda x: re.sub(r"\(.*\)", "", x))。
回答by Wiktor Stribi?ew
If you have multiple (...)substrings in the data you should consider using either
如果(...)数据中有多个子字符串,则应考虑使用
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*?\)","")
or
或者
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\([^()]*\)","")
The difference is that .*?is slower and does not match line breaks, and [^()]matches any char but (and )and is quite efficient and matches line breaks. The first one will match (...(...)but the second will only match (...).
不同之处在于.*?它更慢并且不匹配换行符,并且[^()]匹配任何字符,但是(and)非常有效并且匹配换行符。第一个会匹配,(...(...)但第二个只会匹配(...)。
If you want to normalize all whitespace after removing these substrings, you may consider
如果您想在删除这些子字符串后规范化所有空格,您可以考虑
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\s*\([^()]*\)","").str.strip()
The \s*\([^()]*\)regex will match 0+ whitespaces and then the string between parentheses and then str.stip()will get rid of any potential trailing whitespace.
在\s*\([^()]*\)正则表达式匹配0+空格,然后括号之间的字符串,然后str.stip()将摆脱任何潜在的尾随空白。
回答by ANIMESH RAMASAMI
#removing the unwanted characters
#删除不需要的字符
Energy['Country'] = Energy['Country'].str.replace(r" \(.*\)","")
Blockquote
块引用
Energy['Country'] = Energy['Country'].str.replace(r"([0-9]+)$","")
this are ways you may also remove the unwanted errors
这是您还可以删除不需要的错误的方法

