pandas 提取值并从中创建新列

Question

提问by NetSmoothMF

I would like to extract certain section of a URL, residing in a column of a Pandas Dataframe and make that a new column. This

我想提取位于 Pandas Dataframe 列中的 URL 的某些部分，并将其设为新列。这个

ref = df['REFERRERURL']
ref.str.findall("\d\d\/(.*?)(;|\?)",flags=re.IGNORECASE)

returns me a Series with tuples in it. How can I take out only one part of that tuple beforethe Series is created, so I can simply turn that into a column? Sample data for referrerurl is

返回一个包含元组的系列。如何在创建系列之前仅取出该元组的一部分，以便我可以简单地将其转换为一列？referrerurl 的示例数据是

http://wap.blah.com/xxx/id/11/someproduct_step2;jsessionid=....

In this example I am interested in creating a column that only has 'someproduct_step2' in it.

在这个例子中，我有兴趣创建一个只有“someproduct_step2”的列。

Thanks,

谢谢，

Answer 1

回答by Jeff

In [25]: df = DataFrame([['http://wap.blah.com/xxx/id/11/someproduct_step2;jsessionid=....']],columns=['A'])

In [26]: df['A'].str.findall("\d\d\/(.*?)(;|\?)",flags=re.IGNORECASE).apply(lambda x: Series(x[0][0],index=['first']))
Out[26]: 
               first
0  someproduct_step2

in 0.11.1 here is a neat way of doing this as well

在 0.11.1 中，这里也是这样做的一种巧妙方法

In [34]: df.replace({ 'A' : "http:.+\d\d\/(.*?)(;|\?).*$"}, { 'A' : r''} ,regex=True)
Out[34]: 
                   A
0  someproduct_step2

Answer 2

回答by NetSmoothMF

This also worked

这也有效

def extract(x):
    res = re.findall("\d\d\/(.*?)(;|\?)",x)
    if res: return res[0][0]

session['RU_2'] = session['REFERRERURL'].apply(extract)

pandas 提取值并从中创建新列

提问by NetSmoothMF

回答by Jeff

回答by NetSmoothMF

相关推荐

最近更新

标签

pandas 提取值并从中创建新列

提问by NetSmoothMF

回答by Jeff

回答by NetSmoothMF

相关推荐

pandas 如何通过字符串匹配加速熊猫行过滤？

pandas 以相反的顺序处理系列（最新在前）

Pandas：获取系列对象中的值标签

将 Pandas 数据框中的所有列相乘

相关推荐

最近更新

标签