如何使用 Pandas/Python 删除括号和所有数据？

Question

提问by Alexis

I have a dataframe where I want to remove all parentheses and stuff inside it.

我有一个数据框，我想删除其中的所有括号和内容。

I checked out : How can I remove text within parentheses with a regex?

Where the answer to remove the data was

删除数据的答案在哪里

re.sub(r'\([^)]*\)', '', filename)

I tried this as well as

我也试过这个

re.sub(r'\(.*?\)', '', filename)

However, I got an error: expected a string or buffer

但是，我收到了一个错误： expected a string or buffer

When I tried using the column df['Column Name']I got no item named 'Column Name'

当我尝试使用df['Column Name']我得到的列时no item named 'Column Name'

I checked the dataframe using df.head()and it showed up as a clean table with the column names as what I wanted them to be....however when I use the reexpression to remove the (stuff) it isn't recognizing the column name that I have.

我使用检查了数据框df.head()，它显示为一个干净的表，列名是我想要的......但是当我使用re表达式删除（东西）时，它无法识别我的列名有。

I normally use

我通常使用

df['name'].str.replace(" ()","")

However, I want to remove the parentheses and what is inside....How can I do this using either regex or pandas?

但是，我想删除括号和里面的内容......我如何使用正则表达式或熊猫来做到这一点？

Thanks!

谢谢！

Here is the solution I used...thanks for the help!

这是我使用的解决方案...感谢您的帮助！

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*\)","")

Answer 1

回答by dmvianna

df['name'].str.replace(r"\(.*\)","")

You can't run refunctions directly on pandas objects. You have to loop them for each element inside the object. So Series.str.replace((r"\(.*\)", "")is just syntactic sugar for Series.apply(lambda x: re.sub(r"\(.*\)", "", x)).

您不能直接在 pandas 对象上运行re函数。您必须为对象内的每个元素循环它们。所以Series.str.replace((r"\(.*\)", "")是只是语法糖Series.apply(lambda x: re.sub(r"\(.*\)", "", x))。

Answer 2

回答by Wiktor Stribi?ew

If you have multiple (...)substrings in the data you should consider using either

如果(...)数据中有多个子字符串，则应考虑使用

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*?\)","")

or

或者

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\([^()]*\)","")

The difference is that .*?is slower and does not match line breaks, and [^()]matches any char but (and )and is quite efficient and matches line breaks. The first one will match (...(...)but the second will only match (...).

不同之处在于.*?它更慢并且不匹配换行符，并且[^()]匹配任何字符，但是(and)非常有效并且匹配换行符。第一个会匹配，(...(...)但第二个只会匹配(...)。

If you want to normalize all whitespace after removing these substrings, you may consider

如果您想在删除这些子字符串后规范化所有空格，您可以考虑

All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\s*\([^()]*\)","").str.strip()

The \s*\([^()]*\)regex will match 0+ whitespaces and then the string between parentheses and then str.stip()will get rid of any potential trailing whitespace.

在\s*\([^()]*\)正则表达式匹配0+空格，然后括号之间的字符串，然后str.stip()将摆脱任何潜在的尾随空白。

Answer 3

回答by ANIMESH RAMASAMI

#removing the unwanted characters

#删除不需要的字符

Energy['Country'] = Energy['Country'].str.replace(r" \(.*\)","")

Blockquote

块引用

Energy['Country'] = Energy['Country'].str.replace(r"([0-9]+)$","")

this are ways you may also remove the unwanted errors

这是您还可以删除不需要的错误的方法

如何使用 Pandas/Python 删除括号和所有数据？

提问by Alexis

回答by dmvianna

回答by Wiktor Stribi?ew

回答by ANIMESH RAMASAMI

相关推荐

最近更新

标签

如何使用 Pandas/Python 删除括号和所有数据？

提问by Alexis

回答by dmvianna

回答by Wiktor Stribi?ew

回答by ANIMESH RAMASAMI

相关推荐

获取文件的公共 URL - Google Cloud Storage - App Engine (Python)

如何在 Python 中获取 UTC 时间？

Python z-score 的概率，反之亦然

在python中，如何对没有返回值的函数进行单元测试？

相关推荐

最近更新

标签