pandas 从数据框中的字符串中提取子字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29294017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:06:47  来源:igfitidea点击:

Extract substring from string in dataframe

pythonpandassplit

提问by nicholas.reichel

I have the following ddataframe:

我有以下 ddataframe:

                             Company Name        Time Expectation
0                Asta Funding Inc. (ASFI)  9:35 AM ET           -
1                       BlackBerry (BBRY)  7:00 AM ET     (
for company in df['Company Name']:
    ticker = re.search("\(.*\)",company).group(0)
    ticker = ticker[1:len(ticker)-1]
    tickers.append(ticker)
.03) 2 Carnival Corp. (CCL) 9:15 AM ET
df['ticker'] = df['Company Name'].str.extract("\((.*)\)") 
.09 3 Carnival PLC (CUK) 0:00 AM ET -

I would like to have the company symbols in their own seperate column instead of inside the Company Name column. Right now I just have it iterate over the company names, and a RE pulls the symbols, puts it into a list, and then I apply it to the new column, but I'm wondering if there is a cleaner/easier way.

我想将公司符号放在他们自己的单独列中,而不是在公司名称列中。现在我只是让它遍历公司名称,然后 RE 提取符号,将其放入列表中,然后将其应用于新列,但我想知道是否有更清洁/更简单的方法。

I'm new to the whole map reduce lambda stuff.

我是整个地图减少 lambda 的新手。

df['Company Symbol'] = df['Company Name'].str.rstrip(')').str.split('(').str[1] # Make new column
df['Company Name'] = df['Company Name'].str.replace(r'\(.*?\)$', '') # Remove symbol from company name

回答by mikedal

Regex search is built into the Series class in pandas. You can find the documentation here. In your case, you could use

正则表达式搜索内置于 Pandas 的 Series 类中。您可以在此处找到文档。在你的情况下,你可以使用

##代码##

回答by halex

You can use the fact that stroperates elementwise on a whole series. I assume that the company's symbol will always be at the end of the company name and surrounded by parantheses:

您可以使用str对整个系列进行元素操作的事实。我假设公司的符号将始终位于公司名称的末尾并用括号括起来:

##代码##