pandas 熊猫数据框返回列字符串中的第一个单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37504672/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:18:41  来源:igfitidea点击:

pandas dataframe return first word in string for column

pythonpandasdataframe

提问by Testy8

I have a dataframe:

我有一个数据框:

df = pd.DataFrame({'id' : ['abarth 1.4 a','abarth 1 a','land rover 1.3 r','land rover 2',
                           'land rover 5 g','mazda 4.55 bl'], 
                   'series': ['a','a','r','','g', 'bl'] })

I would like to remove the 'series' string from the corresponding id, so the end result should be:

我想从相应的 id 中删除“系列”字符串,因此最终结果应该是:

Final result should be 'id': ['abarth 1.4','abarth 1','land rover 1.3','land rover 2','land rover 5', 'mazda 4.55']

最终结果应该是 'id': ['abarth 1.4','abarth 1','land rover 1.3','land rover 2','land rover 5', 'mazda 4.55']

Currently I am using df.apply:

目前我正在使用 df.apply:

df.id = df.apply(lambda x: x['id'].replace(x['series'], ''), axis =1)

But this removes all instances of the strings, even in other words, like so: 'id': ['brth 1.4','brth 1','land ove 1.3','land rover 2','land rover 5', 'mazda 4.55']

但这会删除字符串的所有实例,换句话说,就像这样: 'id': ['brth 1.4','brth 1','land ove 1.3','land rover 2','land rover 5', 'mazda 4.55']

Should I somehow mix and match regex with the variable inside df.apply, like so?

我应该以某种方式将正则表达式与 df.apply 中的变量混合和匹配吗?

df.id = df.apply(lambda x: x['id'].replace(r'\b' + x['series'], ''), axis =1)

回答by piRSquared

Use str.splitand str.getand assign using loconly where df.make == ''

使用str.splitstr.get分配loc仅使用wheredf.make == ''

df.loc[df.make == '', 'make'] = df.id.str.split().str.get(0)

print df

               id    make
0      abarth 1.4  abarth
1        abarth 1  abarth
2  land rover 1.3   rover
3    land rover 2   rover
4    land rover 5   rover
5      mazda 4.55   mazda

回答by Parfait

Consider a regex solution with locwhere it extracts everything before first space:

考虑一个正则表达式解决方案loc,它在第一个空间之前提取所有内容:

df.loc[df['make']=='', 'make'] = df['id'].str.extract('(.*) ', expand=False)

Alternatively, use numpy's wherewhich allows the if/then/else conditional logic:

或者,使用 numpy'swhere允许 if/then/else 条件逻辑:

df['make'] = np.where(df['make']=='', 
                      df['id'].str.extract('(.*) ', expand=False), 
                      df['make'])

回答by Aamir Khan

If I got your question correctly you can just use replacefunction:

如果我正确回答了您的问题,您可以使用replace函数:

df.make = df.make.replace("", test.id)

回答by Qazi Basheer

It's simple. Use as follows:

这很简单。使用方法如下:

df['make'] = df['id'].str.split(' ').str[0]