Python 如何比较熊猫中的两列以制作第三列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38925082/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:37:51  来源:igfitidea点击:

how to compare two columns in pandas to make a third column ?

pythonpandas

提问by Anurag Pandey

i have two columns age and sex in a pandas dataframe

我在熊猫数据框中有两列年龄和性别

sex = ['m', 'f' , 'm', 'f', 'f', 'f', 'f']
age = [16 ,  15 , 14 , 9  , 8   , 2   , 56 ]

now i want to extract a third column : like this if age <=9 then output ' child' and if age >9 then output the respective gender

现在我想提取第三列:像这样,如果年龄 <=9 则输出“孩子”,如果年龄 >9 则输出相应的性别

sex = ['m', 'f'  , 'm','f'    ,'f'    ,'f'    , 'f']
age = [16 ,  15  , 14 , 9     , 8     , 2     , 56 ]
yes = ['m', 'f'  ,'m' ,'child','child','child','f' ]

please help ps . i am still working on it if i get anything i will immediately update

请帮忙ps。我还在努力,如果我得到任何东西我会立即更新

回答by root

Use numpy.where:

使用numpy.where

df['col3'] = np.where(df['age'] <= 9, 'child', df['sex'])

The resulting output:

结果输出:

   age sex   col3
0   16   m      m
1   15   f      f
2   14   m      m
3    9   f  child
4    8   f  child
5    2   f  child
6   56   f      f

Timings

时间安排

Using the following setup to get a larger sample DataFrame:

使用以下设置来获得更大的样本 DataFrame:

np.random.seed([3,1415])
n = 10**5
df = pd.DataFrame({'sex': np.random.choice(['m', 'f'], size=n), 'age': np.random.randint(0, 100, size=n)})

I get the following timings:

我得到以下时间:

%timeit np.where(df['age'] <= 9, 'child', df['sex'])
1000 loops, best of 3: 1.26 ms per loop

%timeit df['sex'].where(df['age'] > 9, 'child')
100 loops, best of 3: 3.25 ms per loop

%timeit df.apply(lambda x: 'child' if x['age'] <= 9 else x['sex'], axis=1)
100 loops, best of 3: 3.92 ms per loop

回答by Tim Fuchs

You could use pandas.DataFrame.where. For example

您可以使用pandas.DataFrame.where。例如

child.where(age<=9, sex)

回答by ragesz

df = pd.DataFrame({'sex':['m', 'f' , 'm', 'f', 'f', 'f', 'f'],
    'age':[16, 15, 14, 9, 8, 2, 56]})
df['yes'] = df.apply(lambda x: 'child' if x['age'] <= 9 else x['sex'], axis=1)

Result:

结果:

   age sex    yes
0   16   m      m
1   15   f      f
2   14   m      m
3    9   f  child
4    8   f  child
5    2   f  child
6   56   f      f