Python 熊猫：如果 A 列中的行包含“x”，则将“y”写入 B 列中的行

Question

提问by Winterflags

For pandas, I'm looking for a way to write conditional values to each row in column B, based on substrings for corresponding rows in column A.

对于pandas，我正在寻找一种方法，根据 A 列中相应行的子字符串，将条件值写入 B 列中的每一行。

So if cell in Acontains "BULL", write "Long"to B. Or if cell in Acontains "BEAR", write "Short"to B.

所以，如果在细胞A中包含"BULL"，写"Long"来B。或者，如果细胞中A含有"BEAR"，写"Short"来B。

Desired output:

期望的输出：

A                  B
"BULL APPLE X5"    "Long"
"BEAR APPLE X5"    "Short"
"BULL APPLE X5"    "Long"

B is initially empty: df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']],columns=['A','B'])

B 初始为空： df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']],columns=['A','B'])

Answer 1

采纳答案by Padraic Cunningham

Your code would error as you creating the Dataframe incorrectly, just create a single column Athen add Bbased on A:

当您错误地创建数据框时，您的代码会出错，只需创建一个列A然后添加B基于A：

import pandas as pd
df = pd.DataFrame(["BULL","BEAR","BULL"], columns=['A'])
df["B"] = ["Long" if ele  == "BULL" else "Short" for ele in df["A"]]

print(df)

    A      B
0  BULL   Long
1  BEAR  Short
2  BULL   Long

Or do you logic with the data before you create the dataframe:

或者在创建数据框之前对数据进行逻辑处理：

import pandas as pd
data = ["BULL","BEAR","BULL"]
data2 = ["Long" if ele  == "BULL" else "Short" for ele in data]
df = pd.DataFrame(list(zip(data, data2)), columns=['A','B'])

print(df)
      A      B
 0  BULL   Long
 1  BEAR  Short
 2  BULL   Long

For your edit:

对于您的编辑：

df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']], columns=['A','B'])

df["B"] = df["A"].map(lambda x: "Long" if "BULL" in x else "Short" if "BEAR" in x else "")

print(df)

            A      B
0  BULL APPLE X5   Long
1  BEAR APPLE X5  Short
2  BULL APPLE X5   Long

Or just add the column after:

或者只是在后面添加列：

df = pd.DataFrame(['BULL APPLE X5','BEAR APPLE X5','BLL APPLE X5'], columns=['A'])

df["B"] = df["A"].map(lambda x: "Long" if "BULL" in x else "Short" if "BEAR" in x else "")

print(df)

Or using contains:

或使用包含：

df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']], columns=['A','B'])


df["B"][df['A'].str.contains("BULL")] = "Long"
df["B"][df['A'].str.contains("BEAR")] = "Short"

print(df)
0  BULL APPLE X5   Long
1  BEAR APPLE X5  Short
2  BULL APPLE X5   Long

Answer 2

回答by Anand S Kumar

Also, for populating the df['B']you can try the below method -

此外，为了填充df['B']您可以尝试以下方法 -

def applyFunc(s):
    if s == 'BULL':
        return 'Long'
    elif s == 'BEAR':
        return 'Short'
    return ''

df['B'] = df['A'].apply(applyFunc)
df
>>
       A      B
0  BULL   Long
1  BEAR  Short
2  BULL   Long

What the applyfunction does, is that for each row value of df['A'], it calls the applyFuncfunction with the parameter as the value of that row , and the returned value is put into the same row for df['B'], what really happens behind the scene is a bit different though, the value is not directly put into df['B']but rather a new Seriesis created and at the end, the new Series is assigned to df['B'].

该apply函数的作用是，对于的每一行值df['A']，它applyFunc以参数作为该行的值调用该函数，并将返回的值放入同一行 for 中df['B']，但幕后真正发生的事情有点不同，该值不是直接放入df['B']，而是Series创建一个新的，最后，新的系列被分配给df['B']。

Answer 3

回答by unutbu

You could use str.extractto search for regex pattern BULL|BEAR, and then use Series.mapto replace those strings with Longor Short:

您可以使用str.extract搜索正则表达式模式BULL|BEAR，然后使用或Series.map替换这些字符串：LongShort

In [50]: df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']],columns=['A','B'])

In [51]: df['B'] = df['A'].str.extract(r'(BULL|BEAR)').map({'BULL':'Long', 'BEAR':'Short'})

In [55]: df
Out[55]: 
               A      B
0  BULL APPLE X5   Long
1  BEAR APPLE X5  Short
2  BULL APPLE X5   Long

However, forming the intermediate Series with str.extractis quite slow compared to df['A'].map(lambda x:...). Using IPython's %timeitto time the benchmarks,

但是，与str.extract相比，形成中间系列的速度相当慢df['A'].map(lambda x:...)。使用 IPython%timeit来计时基准，

In [5]: df = pd.concat([df]*10000)

In [6]: %timeit df['A'].str.extract(r'(BULL|BEAR)').map({'BULL':'Long', 'BEAR':'Short'})
10 loops, best of 3: 39.7 ms per loop

In [7]: %timeit df["A"].map(lambda x: "Long" if "BULL" in x else "Short" if "BEAR" in x else "")
100 loops, best of 3: 4.98 ms per loop

The majority of time is spent in str.extract:

大部分时间花在str.extract：

In [8]: %timeit df['A'].str.extract(r'(BULL|BEAR)')
10 loops, best of 3: 37.1 ms per loop

while the call to Series.mapis relatively fast:

虽然调用Series.map相对较快：

In [9]: x = df['A'].str.extract(r'(BULL|BEAR)')

In [10]: %timeit x.map({'BULL':'Long', 'BEAR':'Short'})
1000 loops, best of 3: 1.82 ms per loop

Python 熊猫：如果 A 列中的行包含“x”，则将“y”写入 B 列中的行

提问by Winterflags

采纳答案by Padraic Cunningham

回答by Anand S Kumar

回答by unutbu

相关推荐

最近更新

标签

Python 熊猫：如果 A 列中的行包含“x”，则将“y”写入 B 列中的行

提问by Winterflags

采纳答案by Padraic Cunningham

回答by Anand S Kumar

回答by unutbu

相关推荐

Python在线程之间创建共享变量

如何读取文本文件中的值并将其存储到字典中。[Python]

Python 对二维 numpy 数组进行子集化

PANDAS 中类似 SQL 的窗口函数：Python Pandas Dataframe 中的行编号

相关推荐

最近更新

标签