Python Pandas - 根据行值有条件地为新列选择数据的源列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23934905/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - conditionally select source column of data for a new column based on row value
提问by aensm
Is there a pandas function that allows selection from different columns based on a condition? This is analogous to a CASE statement in a SQL Select clause. For example, say I have the following DataFrame:
是否有允许根据条件从不同列中进行选择的 Pandas 函数?这类似于 SQL Select 子句中的 CASE 语句。例如,假设我有以下 DataFrame:
foo = DataFrame(
[['USA',1,2],
['Canada',3,4],
['Canada',5,6]],
columns = ('Country', 'x', 'y')
)
I want to select from column 'x' when Country=='USA', and from column 'y' when Country=='Canada', resulting in something like the following:
我想在 Country=='USA' 时从 'x' 列中选择,当 Country=='Canada' 时从 'y' 列中进行选择,结果如下所示:
Country x y z
0 USA 1 2 1
1 Canada 3 4 4
2 Canada 5 6 6
[3 rows x 4 columns]
采纳答案by falsetru
Using DataFrame.where
's other
argument and pandas.concat
:
使用DataFrame.where
的other
参数 和pandas.concat
:
>>> import pandas as pd
>>>
>>> foo = pd.DataFrame([
... ['USA',1,2],
... ['Canada',3,4],
... ['Canada',5,6]
... ], columns=('Country', 'x', 'y'))
>>>
>>> z = foo['x'].where(foo['Country'] == 'USA', foo['y'])
>>> pd.concat([foo['Country'], z], axis=1)
Country x
0 USA 1
1 Canada 4
2 Canada 6
If you want z
as column name, specify keys
:
如果您想要z
作为列名,请指定keys
:
>>> pd.concat([foo['Country'], z], keys=['Country', 'z'], axis=1)
Country z
0 USA 1
1 Canada 4
2 Canada 6
回答by EdChum
This would work:
这会起作用:
In [84]:
def func(x):
if x['Country'] == 'USA':
return x['x']
if x['Country'] == 'Canada':
return x['y']
return NaN
foo['z'] = foo.apply(func(row), axis = 1)
foo
Out[84]:
Country x y z
0 USA 1 2 1
1 Canada 3 4 4
2 Canada 5 6 6
[3 rows x 4 columns]
You can use loc
:
您可以使用loc
:
In [137]:
foo.loc[foo['Country']=='Canada','z'] = foo['y']
foo.loc[foo['Country']=='USA','z'] = foo['x']
foo
Out[137]:
Country x y z
0 USA 1 2 1
1 Canada 3 4 4
2 Canada 5 6 6
[3 rows x 4 columns]
EDIT
编辑
Although unwieldy using loc
will scale better with larger dataframes as the apply here is called for every row whilst using boolean indexing will be vectorised.
尽管笨拙的使用loc
将随着更大的数据帧扩展得更好,因为这里的应用是为每一行调用的,而使用布尔索引将被向量化。
回答by Alexander McFarlane
Here is a generic solution to selecting arbitrary columns given a value in another column.
这是在给定另一列中的值的情况下选择任意列的通用解决方案。
This has the additional benefit of separating the lookup logic in a simple dict
structure which makes it easy to modify.
这有一个额外的好处,即在一个简单的dict
结构中分离查找逻辑,使其易于修改。
import pandas as pd
df = pd.DataFrame(
[['UK', 'burgers', 4, 5, 6],
['USA', 4, 7, 9, 'make'],
['Canada', 6, 4, 6, 'you'],
['France', 3, 6, 'fat', 8]],
columns = ('Country', 'a', 'b', 'c', 'd')
)
I extend to an operation where a conditional result is stored in an external lookup structure (dict
)
我扩展到将条件结果存储在外部查找结构 ( dict
) 中的操作
lookup = {'Canada': 'd', 'France': 'c', 'UK': 'a', 'USA': 'd'}
Loop the pd.DataFrame
for each column stored in the dict
and use the values in the condition table to determine which column to select
循环pd.DataFrame
存储在 中的每一列dict
并使用条件表中的值来确定要选择的列
for k,v in lookup.iteritems():
filt = df['Country'] == k
df.loc[filt, 'result'] = df.loc[filt, v] # modifies in place
To give the life lesson
给人生上一课
In [69]: df
Out[69]:
Country a b c d result
0 UK burgers 4 5 6 burgers
1 USA 4 7 9 make make
2 Canada 6 4 6 you you
3 France 3 6 fat 8 fat
回答by Super Mario
My try:
我的尝试:
temp1 = foo[(foo['Country'] == 'Canada')][['Country', 'y']].rename(columns={'y': 'z'})
temp2 = foo[(foo['Country'] == 'USA')][['Country', 'x']].rename(columns={'x': 'z'})
wanted_df = pd.concat([temp1, temp2])