Pandas str.extract: AttributeError: 'str' 对象没有属性 'str'

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31046060/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:31:48  来源:igfitidea点击:

Pandas str.extract: AttributeError: 'str' object has no attribute 'str'

pythonregexpython-2.7pandas

提问by Winterflags

I'm trying to repurpose this function from using splitto using str.extract(regex) instead.

我正在尝试将此函数从 usingsplit改为 using str.extract(regex)。

def bull_lev(x):
    spl = x.rsplit(None, 2)[-2].strip("Xx")
    if spl.str.isdigit():
        return "+" + spl + "00"
    return "+100"

def bear_lev(x):
    spl = x.rsplit(None, 2)[-2].strip("Xx")
    if spl.str.isdigit(): 
        return "-" + spl + "00"
    return "-100"

df["leverage"] = df["name"].map(lambda x: bull_lev(x)
    if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100"


I am using pandasfor DataFramehandling:

pandas用于DataFrame处理:

import pandas as pd
df = pd.DataFrame(["BULL AXP UN X3 VON", "BEAR ESTOX 12x S"], columns=["name"])

Desired output:

期望的输出:

name                    leverage
"BULL AXP UN X3 VON"    "+300"
"BEAR ESTOX 12x S"      "-1200"


Faulty regex attempt for "BULL":

错误的正则表达式尝试"BULL"

def bull_lev(x):
    #spl = x.rsplit(None, 2)[-2].strip("Xx")
    spl = x.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).strip("x")
    if spl.str.isdigit():
        return "+" + spl + "00"
    return "+100"

df["leverage"] = df["name"].map(lambda x: bull_lev(x)
    if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100")

Produces error:

产生错误:

Traceback (most recent call last):
  File "toolkit.py", line 128, in <module>
    df["leverage"] = df["name"].map(lambda x: bull_lev(x)
  File "/Python/Virtual/py2710/lib/python2.7/site-packages/pandas/core/series.py", line 2016, in map
    mapped = map_f(values, arg)
  File "pandas/src/inference.pyx", line 1061, in pandas.lib.map_infer (pandas/lib.c:58435)
  File "toolkit.py", line 129, in <lambda>
    if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100")
  File "toolkit.py", line 123, in bear_lev
    spl = x.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).strip("x")

AttributeError: 'str' object has no attribute 'str'

I am assuming this is due to str.extractcapturing a list while splitworks directly with the string?

我假设这是由于str.extractsplit直接使用字符串时捕获列表?

采纳答案by EdChum

You can handle the positive case using the following:

您可以使用以下方法处理正面案例:

In [150]:
import re
df['fundleverage'] = '+' + df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00'
df

Out[150]:
                 name fundleverage
0  BULL AXP UN X3 VON         +300
1    BULL ESTOX X12 S        +1200

You can use np.whereto handle both cases in a one liner:

您可以使用np.where一个班轮处理这两种情况:

In [151]:
df['fundleverage'] = np.where(df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X').str.isdigit(),  '+' + df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00', '+100')
df

Out[151]:
                 name fundleverage
0  BULL AXP UN X3 VON         +300
1    BULL ESTOX X12 S        +1200

So the above uses the vectorised strmethods strip, extractand isdigitto achieve what you want.

所以上面使用了矢量化str方法stripextractisdigit实现了你想要的。

Update

更新

After you changed your requirements (which you should not do for future reference) you can mask the df for the bull and bear cases:

在您更改您的要求后(您不应该这样做以供将来参考),您可以为牛市和熊市情况屏蔽 df:

In [189]:
import re
df = pd.DataFrame(["BULL AXP UN X3 VON", "BEAR ESTOX 12x S"], columns=["name"])
bull_mask_name = df.loc[df['name'].str.contains('bull', case=False), 'name']
bear_mask_name = df.loc[df['name'].str.contains('bear', case=False), 'name']
df.loc[df['name'].str.contains('bull', case=False), 'fundleverage'] = np.where(bull_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X').str.isdigit(),  '+' + bull_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00', '+100')
df.loc[df['name'].str.contains('bear', case=False), 'fundleverage'] = np.where(bear_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('x').str.isdigit(),  '-' + bear_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('x') + '00', '-100')
df

Out[189]:
                 name fundleverage
0  BULL AXP UN X3 VON         +300
1    BEAR ESTOX 12x S        -1200