pandas 提取值并从中创建新列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16818871/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extracting value and creating new column out of it
提问by NetSmoothMF
I would like to extract certain section of a URL, residing in a column of a Pandas Dataframe and make that a new column. This
我想提取位于 Pandas Dataframe 列中的 URL 的某些部分,并将其设为新列。这个
ref = df['REFERRERURL']
ref.str.findall("\d\d\/(.*?)(;|\?)",flags=re.IGNORECASE)
returns me a Series with tuples in it. How can I take out only one part of that tuple beforethe Series is created, so I can simply turn that into a column? Sample data for referrerurl is
返回一个包含元组的系列。如何在创建系列之前仅取出该元组的一部分,以便我可以简单地将其转换为一列?referrerurl 的示例数据是
http://wap.blah.com/xxx/id/11/someproduct_step2;jsessionid=....
In this example I am interested in creating a column that only has 'someproduct_step2' in it.
在这个例子中,我有兴趣创建一个只有“someproduct_step2”的列。
Thanks,
谢谢,
回答by Jeff
In [25]: df = DataFrame([['http://wap.blah.com/xxx/id/11/someproduct_step2;jsessionid=....']],columns=['A'])
In [26]: df['A'].str.findall("\d\d\/(.*?)(;|\?)",flags=re.IGNORECASE).apply(lambda x: Series(x[0][0],index=['first']))
Out[26]:
first
0 someproduct_step2
in 0.11.1 here is a neat way of doing this as well
在 0.11.1 中,这里也是这样做的一种巧妙方法
In [34]: df.replace({ 'A' : "http:.+\d\d\/(.*?)(;|\?).*$"}, { 'A' : r''} ,regex=True)
Out[34]:
A
0 someproduct_step2
回答by NetSmoothMF
This also worked
这也有效
def extract(x):
res = re.findall("\d\d\/(.*?)(;|\?)",x)
if res: return res[0][0]
session['RU_2'] = session['REFERRERURL'].apply(extract)

