pandas 如何在python pandas中将两列与if/else结合起来?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13596419/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:30:41  来源:igfitidea点击:

how to combine two columns with an if/else in python pandas?

pythonpandas

提问by pocketfullofcheese

I am very new to Pandas (i.e., less than 2 days). However, I can't seem to figure out the right syntax for combining two columns with an if/else condition.

我对 Pandas 很陌生(即不到 2 天)。但是,我似乎无法找出将两列与 if/else 条件组合在一起的正确语法。

Actually, I did figure out one way to do it using 'zip'. This is what I want to accomplish, but it seems there might be a more efficient way to do this in pandas.

实际上,我确实想出了一种使用“zip”的方法。这就是我想要完成的,但似乎在 Pandas 中可能有更有效的方法来做到这一点。

For completeness sake, I include some pre-processing I do to make things clear:

为了完整起见,我包括了一些我做的预处理,以使事情清楚:

records_data = pd.read_csv(open('records.csv'))

## pull out a year from column using a regex
source_years = records_data['source'].map(extract_year_from_source) 

## this is what I want to do more efficiently (if its possible)
records_data['year'] = [s if s else y for (s,y) in zip(source_years, records_data['year'])]

回答by Jeff

In pandas >= 0.10.0 try

在Pandas >= 0.10.0 中尝试

df['year'] = df['year'].where(source_years!=0,df['year'])

and see:

并看到:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking

http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking

As noted in the comments, this DOES use np.where under the hood - the difference is that pandas aligns the series with the output (so for example you can only do a partial update)

正如评论中所指出的,这确实在幕后使用 np.where - 不同之处在于Pandas将系列与输出对齐(例如,您只能进行部分更新)

回答by unutbu

Perhaps try np.where:

也许试试np.where

import numpy as np
df['year'] = np.where(source_years,source_years,df['year'])