Python Pandas - 根据行值有条件地为新列选择数据的源列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23934905/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - conditionally select source column of data for a new column based on row value
提问by aensm
Is there a pandas function that allows selection from different columns based on a condition? This is analogous to a CASE statement in a SQL Select clause. For example, say I have the following DataFrame:
是否有允许根据条件从不同列中进行选择的 Pandas 函数?这类似于 SQL Select 子句中的 CASE 语句。例如,假设我有以下 DataFrame:
foo = DataFrame(
[['USA',1,2],
['Canada',3,4],
['Canada',5,6]],
columns = ('Country', 'x', 'y')
)
I want to select from column 'x' when Country=='USA', and from column 'y' when Country=='Canada', resulting in something like the following:
我想在 Country=='USA' 时从 'x' 列中选择,当 Country=='Canada' 时从 'y' 列中进行选择,结果如下所示:
Country x y z
0 USA 1 2 1
1 Canada 3 4 4
2 Canada 5 6 6
[3 rows x 4 columns]
采纳答案by falsetru
Using DataFrame.where's otherargument and pandas.concat:
使用DataFrame.where的other参数 和pandas.concat:
>>> import pandas as pd
>>>
>>> foo = pd.DataFrame([
... ['USA',1,2],
... ['Canada',3,4],
... ['Canada',5,6]
... ], columns=('Country', 'x', 'y'))
>>>
>>> z = foo['x'].where(foo['Country'] == 'USA', foo['y'])
>>> pd.concat([foo['Country'], z], axis=1)
Country x
0 USA 1
1 Canada 4
2 Canada 6
If you want zas column name, specify keys:
如果您想要z作为列名,请指定keys:
>>> pd.concat([foo['Country'], z], keys=['Country', 'z'], axis=1)
Country z
0 USA 1
1 Canada 4
2 Canada 6
回答by EdChum
This would work:
这会起作用:
In [84]:
def func(x):
if x['Country'] == 'USA':
return x['x']
if x['Country'] == 'Canada':
return x['y']
return NaN
foo['z'] = foo.apply(func(row), axis = 1)
foo
Out[84]:
Country x y z
0 USA 1 2 1
1 Canada 3 4 4
2 Canada 5 6 6
[3 rows x 4 columns]
You can use loc:
您可以使用loc:
In [137]:
foo.loc[foo['Country']=='Canada','z'] = foo['y']
foo.loc[foo['Country']=='USA','z'] = foo['x']
foo
Out[137]:
Country x y z
0 USA 1 2 1
1 Canada 3 4 4
2 Canada 5 6 6
[3 rows x 4 columns]
EDIT
编辑
Although unwieldy using locwill scale better with larger dataframes as the apply here is called for every row whilst using boolean indexing will be vectorised.
尽管笨拙的使用loc将随着更大的数据帧扩展得更好,因为这里的应用是为每一行调用的,而使用布尔索引将被向量化。
回答by Alexander McFarlane
Here is a generic solution to selecting arbitrary columns given a value in another column.
这是在给定另一列中的值的情况下选择任意列的通用解决方案。
This has the additional benefit of separating the lookup logic in a simple dictstructure which makes it easy to modify.
这有一个额外的好处,即在一个简单的dict结构中分离查找逻辑,使其易于修改。
import pandas as pd
df = pd.DataFrame(
[['UK', 'burgers', 4, 5, 6],
['USA', 4, 7, 9, 'make'],
['Canada', 6, 4, 6, 'you'],
['France', 3, 6, 'fat', 8]],
columns = ('Country', 'a', 'b', 'c', 'd')
)
I extend to an operation where a conditional result is stored in an external lookup structure (dict)
我扩展到将条件结果存储在外部查找结构 ( dict) 中的操作
lookup = {'Canada': 'd', 'France': 'c', 'UK': 'a', 'USA': 'd'}
Loop the pd.DataFramefor each column stored in the dictand use the values in the condition table to determine which column to select
循环pd.DataFrame存储在 中的每一列dict并使用条件表中的值来确定要选择的列
for k,v in lookup.iteritems():
filt = df['Country'] == k
df.loc[filt, 'result'] = df.loc[filt, v] # modifies in place
To give the life lesson
给人生上一课
In [69]: df
Out[69]:
Country a b c d result
0 UK burgers 4 5 6 burgers
1 USA 4 7 9 make make
2 Canada 6 4 6 you you
3 France 3 6 fat 8 fat
回答by Super Mario
My try:
我的尝试:
temp1 = foo[(foo['Country'] == 'Canada')][['Country', 'y']].rename(columns={'y': 'z'})
temp2 = foo[(foo['Country'] == 'USA')][['Country', 'x']].rename(columns={'x': 'z'})
wanted_df = pd.concat([temp1, temp2])

