pandas 提取值并从中创建新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16818871/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:51:35  来源:igfitidea点击:

Extracting value and creating new column out of it

pandas

提问by NetSmoothMF

I would like to extract certain section of a URL, residing in a column of a Pandas Dataframe and make that a new column. This

我想提取位于 Pandas Dataframe 列中的 URL 的某些部分,并将其设为新列。这个

ref = df['REFERRERURL']
ref.str.findall("\d\d\/(.*?)(;|\?)",flags=re.IGNORECASE)

returns me a Series with tuples in it. How can I take out only one part of that tuple beforethe Series is created, so I can simply turn that into a column? Sample data for referrerurl is

返回一个包含元组的系列。如何创建系列之前仅取出该元组的一部分,以便我可以简单地将其转换为一列?referrerurl 的示例数据是

http://wap.blah.com/xxx/id/11/someproduct_step2;jsessionid=....

In this example I am interested in creating a column that only has 'someproduct_step2' in it.

在这个例子中,我有兴趣创建一个只有“someproduct_step2”的列。

Thanks,

谢谢,

回答by Jeff

In [25]: df = DataFrame([['http://wap.blah.com/xxx/id/11/someproduct_step2;jsessionid=....']],columns=['A'])

In [26]: df['A'].str.findall("\d\d\/(.*?)(;|\?)",flags=re.IGNORECASE).apply(lambda x: Series(x[0][0],index=['first']))
Out[26]: 
               first
0  someproduct_step2

in 0.11.1 here is a neat way of doing this as well

在 0.11.1 中,这里也是这样做的一种巧妙方法

In [34]: df.replace({ 'A' : "http:.+\d\d\/(.*?)(;|\?).*$"}, { 'A' : r''} ,regex=True)
Out[34]: 
                   A
0  someproduct_step2

回答by NetSmoothMF

This also worked

这也有效

def extract(x):
    res = re.findall("\d\d\/(.*?)(;|\?)",x)
    if res: return res[0][0]

session['RU_2'] = session['REFERRERURL'].apply(extract)