pandas 在python中在下划线处拆分并存储第一个值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29947574/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:17:14  来源:igfitidea点击:

splitting at underscore in python and storing the first value

pythonpandas

提问by Ssank

I have a pandas data frame like df with a column construct_name

我有一个像 df 这样的 Pandas 数据框,带有一个列construct_name

construct_name
aaaa_t1_2    
cccc_t4_10
bbbb_g3_3

and so on. I want to first split all the names at the underscore and store the first element (aaaa,cccc, etc.) as another column name.

等等。我想首先在下划线处拆分所有名称,并将第一个元素(aaaa、cccc 等)存储为另一个列名称。

Expected output

预期输出

construct_name  name
aaaa_t1_2       aaaa
cccc_t4_10      bbbb

and so on.

等等。

I tried the following df['construct_name'].map(lambda row:row.split("_"))and it gives me a list like

我尝试了以下操作 df['construct_name'].map(lambda row:row.split("_")),它给了我一个列表

[aaaa,t1,2]
[cccc,t4,10]

and so on

等等

But when I do

但是当我做

df['construct_name'].map(lambda row:row.split("_"))[0]to get the first element of the list I get an error. Can you suggest a fix. Thanks

df['construct_name'].map(lambda row:row.split("_"))[0]要获取列表的第一个元素,我收到错误消息。你能建议修复吗。谢谢

回答by EdChum

Just use the vectorised strmethod splitand use integer indexing on the list to get the first element:

只需使用矢量化str方法split并在列表上使用整数索引来获取第一个元素:

In [228]:

df['first'] = df['construct_name'].str.split('_').str[0]
df
Out[228]:
  construct_name first
0      aaaa_t1_2  aaaa
1     cccc_t4_10  cccc
2      bbbb_g3_3  bbbb

回答by fixxxer

After you do the split, you should get the first element (using [0]). And not after the map.:

完成后split,您应该获得第一个元素(使用 [0])。而不是在map. 之后:

In [608]: temp['name'] = temp['construct_name'].map(lambda v: v.split('_')[0])

In [609]: temp
Out[609]: 
  construct_name  name
0      aaaa_t1_2  aaaa
1     cccc_t4_10  cccc
2      bbbb_g3_3  bbbb

回答by BioGeek

splittake an optional argument maxsplit:

split采用可选参数maxsplit

>>> construct_name = 'aaaa_t1_2'
>>> name, rest = construct_name.split('_', 1)
>>> name
'aaaa'