pandas 熊猫数据框返回列字符串中的第一个单词

Question

提问by Testy8

I have a dataframe:

我有一个数据框：

df = pd.DataFrame({'id' : ['abarth 1.4 a','abarth 1 a','land rover 1.3 r','land rover 2',
                           'land rover 5 g','mazda 4.55 bl'], 
                   'series': ['a','a','r','','g', 'bl'] })

I would like to remove the 'series' string from the corresponding id, so the end result should be:

我想从相应的 id 中删除“系列”字符串，因此最终结果应该是：

Final result should be 'id': ['abarth 1.4','abarth 1','land rover 1.3','land rover 2','land rover 5', 'mazda 4.55']

最终结果应该是 'id': ['abarth 1.4','abarth 1','land rover 1.3','land rover 2','land rover 5', 'mazda 4.55']

Currently I am using df.apply:

目前我正在使用 df.apply：

df.id = df.apply(lambda x: x['id'].replace(x['series'], ''), axis =1)

But this removes all instances of the strings, even in other words, like so: 'id': ['brth 1.4','brth 1','land ove 1.3','land rover 2','land rover 5', 'mazda 4.55']

但这会删除字符串的所有实例，换句话说，就像这样： 'id': ['brth 1.4','brth 1','land ove 1.3','land rover 2','land rover 5', 'mazda 4.55']

Should I somehow mix and match regex with the variable inside df.apply, like so?

我应该以某种方式将正则表达式与 df.apply 中的变量混合和匹配吗？

df.id = df.apply(lambda x: x['id'].replace(r'\b' + x['series'], ''), axis =1)

Answer 1

回答by piRSquared

Use str.splitand str.getand assign using loconly where df.make == ''

使用str.split和str.get分配loc仅使用wheredf.make == ''

df.loc[df.make == '', 'make'] = df.id.str.split().str.get(0)

print df

               id    make
0      abarth 1.4  abarth
1        abarth 1  abarth
2  land rover 1.3   rover
3    land rover 2   rover
4    land rover 5   rover
5      mazda 4.55   mazda

Answer 2

回答by Parfait

Consider a regex solution with locwhere it extracts everything before first space:

考虑一个正则表达式解决方案loc，它在第一个空间之前提取所有内容：

df.loc[df['make']=='', 'make'] = df['id'].str.extract('(.*) ', expand=False)

Alternatively, use numpy's wherewhich allows the if/then/else conditional logic:

或者，使用 numpy'swhere允许 if/then/else 条件逻辑：

df['make'] = np.where(df['make']=='', 
                      df['id'].str.extract('(.*) ', expand=False), 
                      df['make'])

Answer 3

回答by Aamir Khan

If I got your question correctly you can just use replacefunction:

如果我正确回答了您的问题，您可以使用replace函数：

df.make = df.make.replace("", test.id)

Answer 4

回答by Qazi Basheer

It's simple. Use as follows:

这很简单。使用方法如下：

df['make'] = df['id'].str.split(' ').str[0]

pandas 熊猫数据框返回列字符串中的第一个单词

提问by Testy8

回答by piRSquared

回答by Parfait

回答by Aamir Khan

回答by Qazi Basheer

相关推荐

最近更新

标签

pandas 熊猫数据框返回列字符串中的第一个单词

提问by Testy8

回答by piRSquared

回答by Parfait

回答by Aamir Khan

回答by Qazi Basheer

相关推荐

从 Pandas 到 Statsmodels 的 OLS 中已弃用的滚动窗口选项

pandas 如何解析 DataFrame 列中的所有值？

pandas 使用熊猫时间序列进行线性回归

如何将 Pandas 数据框列从 np.datetime64 转换为 datetime？

相关推荐

最近更新

标签