如何使用 Pandas/Python 删除括号和所有数据?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20894525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:28:37  来源:igfitidea点击:

How to remove parentheses and all data within using Pandas/Python?

pythonregexpandasremoveall

提问by Alexis

I have a dataframe where I want to remove all parentheses and stuff inside it.

我有一个数据框,我想删除其中的所有括号和内容。

I checked out : How can I remove text within parentheses with a regex?

我签出: 如何使用正则表达式删除括号内的文本?

Where the answer to remove the data was

删除数据的答案在哪里

re.sub(r'\([^)]*\)', '', filename)

I tried this as well as

我也试过这个

re.sub(r'\(.*?\)', '', filename)

However, I got an error: expected a string or buffer

但是,我收到了一个错误: expected a string or buffer

When I tried using the column df['Column Name']I got no item named 'Column Name'

当我尝试使用df['Column Name']我得到的列时no item named 'Column Name'

I checked the dataframe using df.head()and it showed up as a clean table with the column names as what I wanted them to be....however when I use the reexpression to remove the (stuff) it isn't recognizing the column name that I have.

我使用检查了数据框df.head(),它显示为一个干净的表,列名是我想要的......但是当我使用re表达式删除(东西)时,它无法识别我的列名有。

I normally use

我通常使用

df['name'].str.replace(" ()","") 

However, I want to remove the parentheses and what is inside....How can I do this using either regex or pandas?

但是,我想删除括号和里面的内容......我如何使用正则表达式或熊猫来做到这一点?

Thanks!

谢谢!

Here is the solution I used...thanks for the help!

这是我使用的解决方案...感谢您的帮助!

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*\)","")

回答by dmvianna

df['name'].str.replace(r"\(.*\)","")

You can't run refunctions directly on pandas objects. You have to loop them for each element inside the object. So Series.str.replace((r"\(.*\)", "")is just syntactic sugar for Series.apply(lambda x: re.sub(r"\(.*\)", "", x)).

您不能直接在 pandas 对象上运行re函数。您必须为对象内的每个元素循环它们。所以Series.str.replace((r"\(.*\)", "")是只是语法糖Series.apply(lambda x: re.sub(r"\(.*\)", "", x))

回答by Wiktor Stribi?ew

If you have multiple (...)substrings in the data you should consider using either

如果(...)数据中有多个子字符串,则应考虑使用

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*?\)","")

or

或者

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\([^()]*\)","")

The difference is that .*?is slower and does not match line breaks, and [^()]matches any char but (and )and is quite efficient and matches line breaks. The first one will match (...(...)but the second will only match (...).

不同之处在于.*?它更慢并且不匹配换行符,并且[^()]匹配任何字符,但是(and)非常有效并且匹配换行符。第一个会匹配,(...(...)但第二个只会匹配(...)

If you want to normalize all whitespace after removing these substrings, you may consider

如果您想在删除这些子字符串后规范化所有空格,您可以考虑

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\s*\([^()]*\)","").str.strip()

The \s*\([^()]*\)regex will match 0+ whitespaces and then the string between parentheses and then str.stip()will get rid of any potential trailing whitespace.

\s*\([^()]*\)正则表达式匹配0+空格,然后括号之间的字符串,然后str.stip()将摆脱任何潜在的尾随空白。

回答by ANIMESH RAMASAMI

#removing the unwanted characters

#删除不需要的字符

Energy['Country'] = Energy['Country'].str.replace(r" \(.*\)","")

Blockquote

块引用

Energy['Country'] = Energy['Country'].str.replace(r"([0-9]+)$","")

this are ways you may also remove the unwanted errors

这是您还可以删除不需要的错误的方法