Python 熊猫:如果 A 列中的行包含“x”,则将“y”写入 B 列中的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30953299/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:13:19  来源:igfitidea点击:

Pandas: if row in column A contains "x", write "y" to row in column B

pythonpandas

提问by Winterflags

For pandas, I'm looking for a way to write conditional values to each row in column B, based on substrings for corresponding rows in column A.

对于pandas,我正在寻找一种方法,根据 A 列中相应行的子字符串,将条件值写入 B 列中的每一行。

So if cell in Acontains "BULL", write "Long"to B. Or if cell in Acontains "BEAR", write "Short"to B.

所以,如果在细胞A中包含"BULL",写"Long"B。或者,如果细胞中A含有"BEAR",写"Short"B

Desired output:

期望的输出:

A                  B
"BULL APPLE X5"    "Long"
"BEAR APPLE X5"    "Short"
"BULL APPLE X5"    "Long"

B is initially empty: df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']],columns=['A','B'])

B 初始为空: df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']],columns=['A','B'])

采纳答案by Padraic Cunningham

Your code would error as you creating the Dataframe incorrectly, just create a single column Athen add Bbased on A:

当您错误地创建数据框时,您的代码会出错,只需创建一个列A然后添加B基于A

import pandas as pd
df = pd.DataFrame(["BULL","BEAR","BULL"], columns=['A'])
df["B"] = ["Long" if ele  == "BULL" else "Short" for ele in df["A"]]

print(df)

    A      B
0  BULL   Long
1  BEAR  Short
2  BULL   Long

Or do you logic with the data before you create the dataframe:

或者在创建数据框之前对数据进行逻辑处理:

import pandas as pd
data = ["BULL","BEAR","BULL"]
data2 = ["Long" if ele  == "BULL" else "Short" for ele in data]
df = pd.DataFrame(list(zip(data, data2)), columns=['A','B'])

print(df)
      A      B
 0  BULL   Long
 1  BEAR  Short
 2  BULL   Long

For your edit:

对于您的编辑:

df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']], columns=['A','B'])

df["B"] = df["A"].map(lambda x: "Long" if "BULL" in x else "Short" if "BEAR" in x else "")

print(df)

            A      B
0  BULL APPLE X5   Long
1  BEAR APPLE X5  Short
2  BULL APPLE X5   Long

Or just add the column after:

或者只是在后面添加列:

df = pd.DataFrame(['BULL APPLE X5','BEAR APPLE X5','BLL APPLE X5'], columns=['A'])

df["B"] = df["A"].map(lambda x: "Long" if "BULL" in x else "Short" if "BEAR" in x else "")

print(df)

Or using contains:

或使用包含:

df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']], columns=['A','B'])


df["B"][df['A'].str.contains("BULL")] = "Long"
df["B"][df['A'].str.contains("BEAR")] = "Short"

print(df)
0  BULL APPLE X5   Long
1  BEAR APPLE X5  Short
2  BULL APPLE X5   Long

回答by Anand S Kumar

Also, for populating the df['B']you can try the below method -

此外,为了填充df['B']您可以尝试以下方法 -

def applyFunc(s):
    if s == 'BULL':
        return 'Long'
    elif s == 'BEAR':
        return 'Short'
    return ''

df['B'] = df['A'].apply(applyFunc)
df
>>
       A      B
0  BULL   Long
1  BEAR  Short
2  BULL   Long

What the applyfunction does, is that for each row value of df['A'], it calls the applyFuncfunction with the parameter as the value of that row , and the returned value is put into the same row for df['B'], what really happens behind the scene is a bit different though, the value is not directly put into df['B']but rather a new Seriesis created and at the end, the new Series is assigned to df['B'].

apply函数的作用是,对于 的每一行值df['A'],它applyFunc以参数作为该行的值调用该函数,并将返回的值放入同一行 for 中df['B'],但幕后真正发生的事情有点不同,该值不是直接放入df['B'],而是Series创建一个新的,最后,新的系列被分配给df['B']

回答by unutbu

You could use str.extractto search for regex pattern BULL|BEAR, and then use Series.mapto replace those strings with Longor Short:

您可以使用str.extract搜索正则表达式模式BULL|BEAR,然后使用或Series.map替换这些字符串:LongShort

In [50]: df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']],columns=['A','B'])

In [51]: df['B'] = df['A'].str.extract(r'(BULL|BEAR)').map({'BULL':'Long', 'BEAR':'Short'})

In [55]: df
Out[55]: 
               A      B
0  BULL APPLE X5   Long
1  BEAR APPLE X5  Short
2  BULL APPLE X5   Long

However, forming the intermediate Series with str.extractis quite slow compared to df['A'].map(lambda x:...). Using IPython's %timeitto time the benchmarks,

但是,与str.extract相比,形成中间系列的速度相当慢df['A'].map(lambda x:...)。使用 IPython%timeit来计时基准,

In [5]: df = pd.concat([df]*10000)

In [6]: %timeit df['A'].str.extract(r'(BULL|BEAR)').map({'BULL':'Long', 'BEAR':'Short'})
10 loops, best of 3: 39.7 ms per loop

In [7]: %timeit df["A"].map(lambda x: "Long" if "BULL" in x else "Short" if "BEAR" in x else "")
100 loops, best of 3: 4.98 ms per loop

The majority of time is spent in str.extract:

大部分时间花在str.extract

In [8]: %timeit df['A'].str.extract(r'(BULL|BEAR)')
10 loops, best of 3: 37.1 ms per loop

while the call to Series.mapis relatively fast:

虽然调用Series.map相对较快:

In [9]: x = df['A'].str.extract(r'(BULL|BEAR)')

In [10]: %timeit x.map({'BULL':'Long', 'BEAR':'Short'})
1000 loops, best of 3: 1.82 ms per loop