pandas 如何通过分隔符拆分熊猫列并选择首选元素作为替换
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33604139/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split pandas column by a delimiter and select preferred element as the replacement
提问by neversaint
I have the following pandas data frame:
我有以下Pandas数据框:
import pandas as pd
df = pd.DataFrame({ 'gene':["1 // foo // blabla",
"2 // bar // lalala",
"3 // qux // trilil",
"4 // woz // hohoho"], 'cell1':[5,9,1,7], 'cell2':[12,90,13,87]})
df = source_df[["gene","cell1","cell2"]]
It looks like this:
它看起来像这样:
gene cell1 cell2
0 1 // foo // blabla 5 12
1 2 // bar // lalala 9 90
2 3 // qux // trilil 1 13
3 4 // woz // hohoho 7 87
What I want to get is this:
我想得到的是:
gene cell1 cell2
0 foo 5 12
1 bar 9 90
2 qux 1 13
3 woz 7 87
Namely select 2nd element of the splited string by //
as delimiter.
即选择拆分字符串的第二个元素//
作为分隔符。
The best I can do is this:
我能做的最好的是:
df["gene"] = df["gene"].str.split(" // ")
df
Out[17]:
gene cell1 cell2
0 [1, foo, blabla] 5 12
1 [2, bar, lalala] 9 90
2 [3, qux, trilil] 1 13
3 [4, woz, hohoho] 7 87
What's the right way to do it?
正确的做法是什么?
回答by EdChum
回答by jezrael
You can use regex
and strip first and last spaces by strip
:
您可以通过以下方式使用regex
和去除第一个和最后一个空格strip
:
df["gene"] = df["gene"].str.extract(r"\/\/([a-z ]+)\/\/")
df["gene"] = df["gene"].str.strip()
print df
gene cell1 cell2
0 foo 5 12
1 bar 9 90
2 qux 1 13
3 woz 7 87
\/\/([a-z ]+)\/\/
means:
\/\/([a-z ]+)\/\/
方法:
\/ matches the character / literally
\/ matches the character / literally
1st Capturing group ([a-z ]+)
[a-z ]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible,
giving back as needed [greedy]
a-z a single character in the range between a and z (case sensitive)
the literal character
\/ matches the character / literally
\/ matches the character / literally
Or regex without strip:
或不带条的正则表达式:
df["gene"] = df["gene"].str.extract(r"\/\/\s*([a-z ]+)\s\/\/")
/\/\/\s*([a-z ]+)\s\/\//
means:
/\/\/\s*([a-z ]+)\s\/\//
方法:
\/ matches the character / literally
\/ matches the character / literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible,
giving back as needed [greedy]
1st Capturing group ([a-z ]+)
[a-z ]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible,
giving back as needed [greedy]
a-z a single character in the range between a and z (case sensitive)
the literal character
\s match any white space character [\r\n\t\f ]
\/ matches the character / literally
\/ matches the character / literally
回答by sohail288
You were pretty close, however selecting the element from the resulting split is a bit more difficult doing it your way.
您非常接近,但是从结果拆分中选择元素有点困难。
Here's a solution with apply
这是一个带有应用的解决方案
>>> df['gene'] = df['gene'].apply(lambda s: s.split('//')[1])
>>> df
gene cell1 cell2
0 foo 5 12
1 bar 9 90
2 qux 1 13
3 woz 7 87